Recent work has demonstrated the ability of deep reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation, including in the domain of multi-fingered manipulation. However, such models can be challenging to transfer to the real world due to the gap between simulation and reality. In this paper, we present our techniques to train a) a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand and b) a robust pose estimator suitable for providing reliable real-time information on the state of the object being manipulated. Our policies are trained to adapt to a wide range of conditions in simulation. Consequently, our vision-based policies significantly outperform the best vision policies in the literature on the same reorientation task and are competitive with policies that are given privileged state information via motion capture systems. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups, and in our case, with the Allegro Hand and Isaac Gym GPU-based simulation. Furthermore, it opens up possibilities for researchers to achieve such results with commonly-available, affordable robot hands and cameras. Videos of the resulting policy and supplementary information, including experiments and demos, can be found at \url{https://dextreme.org/}
翻译:最近的工作表明,深度强化学习(RL)算法有能力在模拟中学习复杂的机器人行为,包括在多手指操纵领域;然而,由于模拟与现实之间的差距,这些模型可能具有向现实世界转移的挑战性,因为模拟与现实之间存在差距;在本文件中,我们提出我们的技术培训a)政策,该政策可以对人体形态机器人手部进行强有力的极速操纵;b)一个强大的表面估计器,适合提供关于被操纵物体状态的可靠实时信息;我们的政策经过培训,以适应模拟中的各种条件。因此,我们基于愿景的政策大大优于同一调整任务文献中的最佳愿景政策,与通过运动捕捉系统给予特权状态信息的政策具有竞争性。我们的工作重申了在各种硬件和模拟器安装中进行极速操纵的可能性,就我们而言,Allegro手和Isaac Gym{Gym{PU - 模拟也为研究人员提供了实现这些结果的可能性,包括常用的、可负担得起的机器人手/制成的模拟。