建模免费加强学习的 " 赤道推动者网络 " (A Quadratic Actor Network for Model-Free Reinforcement Learning)

In this work we discuss the incorporation of quadratic neurons into policy networks in the context of model-free actor-critic reinforcement learning. Quadratic neurons admit an explicit quadratic function approximation in contrast to conventional approaches where the the non-linearity is induced by the activation functions. We perform empiric experiments on several MuJoCo continuous control tasks and find that when quadratic neurons are added to MLP policy networks those outperform the baseline MLP whilst admitting a smaller number of parameters. The top returned reward is in average increased by $5.8\%$ while being about $21\%$ more sample efficient. Moreover, it can maintain its advantage against added action and observation noise.

翻译：在这项工作中,我们讨论将二次神经元纳入政策网络的问题,在不使用模型的行为者-批评强化学习的范围内; 二次神经元承认明确的二次函数近似值,与非线性是由激活功能引起的常规方法形成对比; 我们在几项Mujoco连续控制任务上进行试验,发现当四级神经元加入多边劳工伙伴关系政策网络时,那些超标的MLP,但承认较少的参数; 最高回报的奖励平均增加5.8美元,而样本效率则增加约210美元; 此外,它可以保持其优势,防止增加的行动和观测噪音。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日