Deep reinforcement learning (DRL) algorithms and evolution strategies (ES) have been applied to various tasks, showing excellent performances. These have the opposite properties, with DRL having good sample efficiency and poor stability, while ES being vice versa. Recently, there have been attempts to combine these algorithms, but these methods fully rely on synchronous update scheme, making it not ideal to maximize the benefits of the parallelism in ES. To solve this challenge, asynchronous update scheme was introduced, which is capable of good time-efficiency and diverse policy exploration. In this paper, we introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods. Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL, which are exploration and time efficiency, stability, and sample efficiency, respectively. The proposed framework and update methods are evaluated in continuous control benchmark work, showing superior performance as well as time efficiency compared to the previous methods.
翻译:深入强化学习(DRL)算法和进化战略(ES)已应用于各种任务,表现优异,具有相反的特性,DRL的样本效率良好,稳定性差,而ES则相反。最近,有人试图将这些算法结合起来,但是这些方法完全依赖同步更新计划,因此不理想地尽量扩大ES和DRL的平行优势。为了应对这一挑战,引入了不同步更新计划,它能够具有良好的时间效率和多样化的政策探索。在本文中,我们引入了同步进化战略-强化学习(AES-RL),以最大限度地提高ES的平行效率,并将它与政策梯度方法结合起来。具体地说,我们提议了1个新的框架,以合并ES和DRL的同步更新计划,使ES和DL的平行性更新计划产生最大效益。为了应对这一挑战,我们引入了非同步更新计划,它能够带来良好的时间效率、稳定性和样本效率。在连续的控制基准工作中,对拟议的框架和更新方法进行了评价,显示与以往相比,更高的业绩和时间效率。