In this paper, we propose a novel representation for grasping using contacts between multi-finger robotic hands and objects to be manipulated. This representation significantly reduces the prediction dimensions and accelerates the learning process. We present an effective end-to-end network, CMG-Net, for grasping unknown objects in a cluttered environment by efficiently predicting multi-finger grasp poses and hand configurations from a single-shot point cloud. Moreover, we create a synthetic grasp dataset that consists of five thousand cluttered scenes, 80 object categories, and 20 million annotations. We perform a comprehensive empirical study and demonstrate the effectiveness of our grasping representation and CMG-Net. Our work significantly outperforms the state-of-the-art for three-finger robotic hands. We also demonstrate that the model trained using synthetic data performs very well for real robots.
翻译:在本文中,我们提出了一种使用机器手指与待操作物品之间的接触来表示抓握的新方法。该方法可大大减少预测维度并加速学习过程。我们提出了一种有效的端到端网络 CMG-Net,用于在混乱的环境中通过单次点云有效地预测多指抓握姿势和手型。此外,我们创建了一个包括五千个杂乱场景、80个物体类别和2000万个注释的综合性抓握数据集。我们进行了全面的实证研究,并展示了我们的抓握表示和 CMG-Net 的有效性。我们的工作显著优于三指机械手的现有技术水平,并且我们还展示了使用合成数据训练的模型对真实机器人的表现非常好。