从图像中进行非直接处理:通过子步骤指导实现现实世界自治RL (Dexterous Manipulation from Images: Autonomous Real-World RL via Substep Guidance)

Complex and contact-rich robotic manipulation tasks, particularly those that involve multi-fingered hands and underactuated object manipulation, present a significant challenge to any control method. Methods based on reinforcement learning offer an appealing choice for such settings, as they can enable robots to learn to delicately balance contact forces and dexterously reposition objects without strong modeling assumptions. However, running reinforcement learning on real-world dexterous manipulation systems often requires significant manual engineering. This negates the benefits of autonomous data collection and ease of use that reinforcement learning should in principle provide. In this paper, we describe a system for vision-based dexterous manipulation that provides a "programming-free" approach for users to define new tasks and enable robots with complex multi-fingered hands to learn to perform them through interaction. The core principle underlying our system is that, in a vision-based setting, users should be able to provide high-level intermediate supervision that circumvents challenges in teleoperation or kinesthetic teaching which allow a robot to not only learn a task efficiently but also to autonomously practice. Our system includes a framework for users to define a final task and intermediate sub-tasks with image examples, a reinforcement learning procedure that learns the task autonomously without interventions, and experimental results with a four-finger robotic hand learning multi-stage object manipulation tasks directly in the real world, without simulation, manual modeling, or reward engineering.

翻译：复杂和接触丰富的机器人操纵任务,特别是那些涉及多指手和未完全操作的物体操纵任务,对任何控制方法都构成重大挑战。基于强化学习的方法为这种环境提供了一个有吸引力的选择,因为这些方法能够使机器人在没有强有力的模型假设的情况下学会微妙地平衡接触力量和灵活地重新定位物体。然而,在现实世界的极具机操作系统上进行强化学习往往需要大量的手工工程。这否定了自主数据收集的好处,也否定了强化学习原则上应该提供的便利使用的挑战。在本文中,我们描述了基于愿景的极速操纵系统,它为用户提供了一个“无程序化”的方法,以界定新的任务,并使具有复杂多指手的机器人能够通过互动来学习这些任务。我们系统的核心原则是,在基于愿景的环境中,用户应当能够提供高层次的中间监督,以回避远程操作或亲身艺术教学的挑战,使机器人不仅能够高效率地学习任务,而且能够自主地实践。我们系统的系统包括一个框架,让用户在不确定最终任务和中间机械工程操作的情况下,在不直接学习任务和多层次的机械操作中,在不直接学习任务和多层次的操作中,学习世界学习任务中学习任务和多层次的机械操作程序。