Intelligent manipulation benefits from the capacity to flexibly control an end-effector with high degrees of freedom (DoF) and dynamically react to the environment. However, due to the challenges of collecting effective training data and learning efficiently, most grasping algorithms today are limited to top-down movements and open-loop execution. In this work, we propose a new low-cost hardware interface for collecting grasping demonstrations by people in diverse environments. Leveraging this data, we show that it is possible to train a robust end-to-end 6DoF closed-loop grasping model with reinforcement learning that transfers to real robots. A key aspect of our grasping model is that it uses ``action-view'' based rendering to simulate future states with respect to different possible actions. By evaluating these states using a learned value function (Q-function), our method is able to better select corresponding actions that maximize total rewards (i.e., grasping success). Our final grasping system is able to achieve reliable 6DoF closed-loop grasping of novel objects across various scene configurations, as well as dynamic scenes with moving objects.
翻译:智能操作从灵活控制具有高度自由度的终端效应(DoF)和对环境动态反应的能力中获益。 但是,由于收集有效培训数据和高效学习的挑战,今天大多数掌握的算法仅限于自上而下移动和开放环执行。在这项工作中,我们提出一个新的低成本硬件界面,用于收集不同环境中的人的捕捉演示。利用这些数据,我们表明有可能培训一个强大的端到端6DoF闭环捕捉模型,强化学习向真实机器人的传输。我们捕捉模型的一个关键方面是,它使用基于“行动视图”的图像模拟未来状态,以模拟不同可能的行动。通过使用学习的价值函数(Q功能)对这些国家进行评估,我们的方法能够更好地选择相应的行动,以最大限度地获得全部回报(即获得成功)。我们最后的捕捉系统能够实现可靠的 6DoF 闭环捕捉各种场配置中的新物体,以及移动物体的动态场景。