GEP-PG: 在深强化学习中分离探索和利用 (GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms)

In continuous action domains, standard deep reinforcement learning algorithms like DDPG suffer from inefficient exploration when facing sparse or deceptive reward problems. Conversely, evolutionary and developmental methods focusing on exploration like Novelty Search, Quality-Diversity or Goal Exploration Processes explore more robustly but are less efficient at fine-tuning policies using gradient descent. In this paper, we present the GEP-PG approach, taking the best of both worlds by sequentially combining a Goal Exploration Process and two variants of DDPG. We study the learning performance of these components and their combination on a low dimensional deceptive reward problem and on the larger Half-Cheetah benchmark. We show that DDPG fails on the former and that GEP-PG improves over the best DDPG variant in both environments. Supplementary videos and discussion can be found at http://frama.link/gep_pg, the code at http://github.com/flowersteam/geppg.

翻译：在连续的行动领域,像DDPG这样的标准的深层强化学习算法在面临稀少或欺骗性的奖励问题时受到效率低下的探索。相反,侧重于新发现搜索、质量多样性或目标探索过程等探索的进化和开发方法更强有力地探索,但在使用梯度下降的微调政策方面效率较低。在本文中,我们介绍了GEP-PG方法,将目标探索进程与DDPG的两个变体相接轨,从而在两个世界中取得最佳效果。我们研究了这些组成部分的学习表现及其在低维度欺骗性奖励问题和较大的半切塔基准上的结合。我们表明,DDPG在前者上失败,GEP-PG在两种环境中都比最佳DDPG变体改进了。补充视频和讨论见http://frama.link/gep_pg,代码见http://githhub.com/flowsteam/geppg。

相关内容

深度强化学习

关注 142

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

81+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

99+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

专知会员服务

176+阅读 · 2020年2月1日