Nowadays robots play an increasingly important role in our daily life. In human-centered environments, robots often encounter piles of objects, packed items, or isolated objects. Therefore, a robot must be able to grasp and manipulate different objects in various situations to help humans with daily tasks. In this paper, we propose a multi-view deep learning approach to handle robust object grasping in human-centric domains. In particular, our approach takes a point cloud of an arbitrary object as an input, and then, generates orthographic views of the given object. The obtained views are finally used to estimate pixel-wise grasp synthesis for each object. We train the model end-to-end using a small object grasp dataset and test it on both simulations and real-world data without any further fine-tuning. To evaluate the performance of the proposed approach, we performed extensive sets of experiments in three scenarios, including isolated objects, packed items, and pile of objects. Experimental results show that our approach performed very well in all simulation and real-robot scenarios, and is able to achieve reliable closed-loop grasping of novel objects across various scene configurations.
翻译:目前,机器人在我们日常生活中发挥着越来越重要的作用。 在以人为中心的环境中,机器人经常遇到一堆物体、包装的物品或孤立的物体。 因此, 机器人必须能够在各种情况下掌握和操作不同的物体, 以帮助人类完成日常任务。 在本文中, 我们提出一个多视角的深层次学习方法, 以处理在以人为中心的领域捕捉的强力物体。 特别是, 我们的方法将任意物体的点云当作输入, 然后生成给定对象的正方位视图。 获得的视图最终被用来估计每个物体的像素智能抓取合成。 我们用一个小对象抓取数据集来训练模型的终端到终端, 在模拟和真实世界数据上测试它, 而不做任何进一步的微调。 为了评估拟议方法的性能, 我们在三种情景中进行了广泛的实验, 包括孤立的物体、 包装的物品和堆积的物体。 实验结果显示, 我们的方法在所有模拟和真实机器人的情景中表现得非常好, 并且能够实现可靠的封闭式捕捉到各种新物体。