Deep Reinforcement Learning (DRL) is emerging as a promising approach to generate adaptive behaviors for robotic platforms. However, a major drawback of using DRL is the data-hungry training regime that requires millions of trial and error attempts, which is impractical when running experiments on robotic systems. To address this issue, we propose a multi-subtask reinforcement learning method where complex tasks are decomposed manually into low-level subtasks by leveraging human domain knowledge. These subtasks can be parametrized as expert networks and learned via existing DRL methods. Trained subtasks can then be composed with a high-level choreographer. As a testbed, we use a pick and place robotic simulator to demonstrate our methodology, and show that our method outperforms an imitation learning-based method and reaches a high success rate compared to an end-to-end learning approach. Moreover, we transfer the learned behavior in a different robotic environment that allows us to exploit sim-to-real transfer and demonstrate the trajectories in a real robotic system. Our training regime is carried out using a central processing unit (CPU)-based system, which demonstrates the data-efficient properties of our approach.
翻译:深度强化学习(DRL)正在成为为机器人平台产生适应行为的一种很有希望的方法。然而,使用DRL的一个主要缺点是数据饥饿培训制度,它需要数百万次的尝试和错误尝试,在对机器人系统进行实验时,这是不切实际的。为了解决这一问题,我们建议采用多子任务强化学习方法,通过利用人类域知识,将复杂的任务手工分解成低层次的子任务。这些子任务可以作为专家网络进行配对,并通过现有的DRL方法学习。然后,培训子任务可以由一个高级舞蹈设计师组成。作为测试台,我们使用一个机械模拟器来演示我们的方法,并显示我们的方法比模拟学习的方法要优于一种模拟方法,并达到与端到端学习方法相比的高成功率。此外,我们将学到的行为传输到不同的机器人环境中,从而使我们能够利用模拟到真实的传输,并演示一个实际机器人系统中的轨迹。我们的培训制度正在使用一个中央处理单元来展示我们基于的数据的特性。