教师-学生学习强化学习课程 (Teacher-student curriculum learning for reinforcement learning)

Reinforcement learning (rl) is a popular paradigm for sequential decision making problems. The past decade's advances in rl have led to breakthroughs in many challenging domains such as video games, board games, robotics, and chip design. The sample inefficiency of deep reinforcement learning methods is a significant obstacle when applying rl to real-world problems. Transfer learning has been applied to reinforcement learning such that the knowledge gained in one task can be applied when training in a new task. Curriculum learning is concerned with sequencing tasks or data samples such that knowledge can be transferred between those tasks to learn a target task that would otherwise be too difficult to solve. Designing a curriculum that improves sample efficiency is a complex problem. In this thesis, we propose a teacher-student curriculum learning setting where we simultaneously train a teacher that selects tasks for the student while the student learns how to solve the selected task. Our method is independent of human domain knowledge and manual curriculum design. We evaluated our methods on two reinforcement learning benchmarks: grid world and the challenging Google Football environment. With our method, we can improve the sample efficiency and generality of the student compared to tabula-rasa reinforcement learning.

翻译：强化学习(rl)是一系列决策问题的流行范例。过去10年在Rl方面的进步导致许多具有挑战性的领域的突破,如电子游戏、棋盘游戏、机器人和芯片设计。深强化学习方法的抽样效率低下,在对现实世界问题应用Rl时是一个重大障碍。转移学习应用到强化学习,这样在一项新任务的培训中就可以应用在一项任务中所获得的知识。课程学习涉及排序任务或数据样本,从而可以在这些任务之间转让知识,以学习否则将难以解决的目标任务。设计提高抽样效率的课程是一个复杂的问题。在这个论文中,我们提出师生学习设置,在学生学习如何解决选定任务的同时,我们同时培训一名教师为学生选择任务。我们的方法独立于人类领域知识和手工课程设计。我们评估了我们两个强化学习基准的方法:网格世界和具有挑战性的谷歌足球环境。我们的方法可以提高学生的抽样效率和一般程度,而不是塔卢拉萨强化学习。