In multi-task reinforcement learning there are two main challenges: at training time, the ability to learn different policies with a single model; at test time, inferring which of those policies applying without an external signal. In the case of continual reinforcement learning a third challenge arises: learning tasks sequentially without forgetting the previous ones. In this paper, we tackle these challenges by proposing DisCoRL, an approach combining state representation learning and policy distillation. We experiment on a sequence of three simulated 2D navigation tasks with a 3 wheel omni-directional robot. Moreover, we tested our approach's robustness by transferring the final policy into a real life setting. The policy can solve all tasks and automatically infer which one to run.
翻译:在多任务强化学习中,存在两个主要挑战:在培训时,学习单一模式的不同政策的能力;在测试时,推断这些政策中哪些政策在没有外部信号的情况下适用。在持续强化学习时,出现第三个挑战:在不忘记前一项挑战的情况下,按顺序学习任务。在本文中,我们通过提出DiscoRL来应对这些挑战,这是一个结合国家代表性学习和政策蒸馏的方法。我们试验了三个模拟的2D导航任务序列,一个3个轮式全方向机器人。此外,我们测试了我们的方法是否稳健,将最终政策转换为真实的生活环境。该政策可以解决所有任务,并自动推断出运行哪个任务。