DOCORL: 通过政策蒸馏不断加强学习 (DisCoRL: Continual Reinforcement Learning via Policy Distillation)

In multi-task reinforcement learning there are two main challenges: at training time, the ability to learn different policies with a single model; at test time, inferring which of those policies applying without an external signal. In the case of continual reinforcement learning a third challenge arises: learning tasks sequentially without forgetting the previous ones. In this paper, we tackle these challenges by proposing DisCoRL, an approach combining state representation learning and policy distillation. We experiment on a sequence of three simulated 2D navigation tasks with a 3 wheel omni-directional robot. Moreover, we tested our approach's robustness by transferring the final policy into a real life setting. The policy can solve all tasks and automatically infer which one to run.

翻译：在多任务强化学习中,存在两个主要挑战:在培训时,学习单一模式的不同政策的能力;在测试时,推断这些政策中哪些政策在没有外部信号的情况下适用。在持续强化学习时,出现第三个挑战:在不忘记前一项挑战的情况下,按顺序学习任务。在本文中,我们通过提出DiscoRL来应对这些挑战,这是一个结合国家代表性学习和政策蒸馏的方法。我们试验了三个模拟的2D导航任务序列,一个3个轮式全方向机器人。此外,我们测试了我们的方法是否稳健,将最终政策转换为真实的生活环境。该政策可以解决所有任务,并自动推断出运行哪个任务。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

专知会员服务

107+阅读 · 2022年3月2日

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

80+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

99+阅读 · 2020年2月8日