A. 理解单项任务风险清单与课程的复杂程度收益 (Understanding the Complexity Gains of Single-Task RL with a Curriculum)

Reinforcement learning (RL) problems can be challenging without well-shaped rewards. Prior work on provably efficient RL methods generally proposes to address this issue with dedicated exploration strategies. However, another way to tackle this challenge is to reformulate it as a multi-task RL problem, where the task space contains not only the challenging task of interest but also easier tasks that implicitly function as a curriculum. Such a reformulation opens up the possibility of running existing multi-task RL methods as a more efficient alternative to solving a single challenging task from scratch. In this work, we provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum. Under mild regularity conditions on the curriculum, we show that sequentially solving each task in the multi-task RL problem is more computationally efficient than solving the original single-task problem, without any explicit exploration bonuses or other exploration strategies. We also show that our theoretical insights can be translated into an effective practical learning algorithm that can accelerate curriculum learning on simulated robotic tasks.

翻译：强化学习(RL)问题可能具有挑战性,而没有完善的奖赏。以往关于高效的RL方法的工作通常建议用专门的勘探战略来解决这个问题。然而,应对这一挑战的另一种方法是将它重新定位为一个多任务RL问题,任务空间不仅包含具有挑战性的感兴趣任务,而且包含作为课程的更轻松的任务。这样的重拟开启了将现有的多任务RL方法作为从零开始解决单一挑战任务的更有效替代方法的可能性。在这项工作中,我们提供了一个理论框架,将单任务RL问题重新表述为课程定义的多任务RL问题。在课程的温和的常规条件下,我们表明,在课程中,按顺序解决多任务RL问题中的每一项任务比解决原来的单一任务问题更具计算效率,而没有任何明确的勘探奖金或其他探索战略。我们还表明,我们的理论洞察力可以转化为有效的实用学习算法,可以加速模拟机器人任务的课程学习。