在深强化学习中发现多种解决办法 (Discovering Diverse Solutions in Deep Reinforcement Learning)

Reinforcement learning (RL) algorithms are typically limited to learning a single solution of a specified task, even though there often exists diverse solutions to a given task. Compared with learning a single solution, learning a set of diverse solutions is beneficial because diverse solutions enable robust few-shot adaptation and allow the user to select a preferred solution. Although previous studies have showed that diverse behaviors can be modeled with a policy conditioned on latent variables, an approach for modeling an infinite set of diverse solutions with continuous latent variables has not been investigated. In this study, we propose an RL method that can learn infinitely many solutions by training a policy conditioned on a continuous or discrete low-dimensional latent variable. Through continuous control tasks, we demonstrate that our method can learn diverse solutions in a data-efficient manner and that the solutions can be used for few-shot adaptation to solve unseen tasks.

翻译：强化学习( RL) 算法通常限于学习特定任务的单一解决方案, 尽管对特定任务往往存在多种解决方案。与学习单一解决方案相比, 学习一系列不同的解决方案是有益的, 因为多种解决方案能够让用户能够进行强力的微小调整, 并允许用户选择首选解决方案。尽管先前的研究显示, 不同的行为可以以潜在变量为条件的政策模式, 但还没有调查一套模型化的无限的、有连续潜在变量的多种解决方案的方法。在本研究中, 我们提出一种RL方法, 通过培训以连续或离散的低维潜伏变量为条件的政策来学习无限多的解决方案。我们通过持续控制任务, 证明我们的方法可以以数据高效的方式学习多种解决方案, 并且这些解决方案可以用于少量的适应, 以解决不可见的任务。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

132+阅读 · 2020年5月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

98+阅读 · 2019年12月23日