In recent years, 3D detection based on stereo cameras has made great progress, but most state-of-the-art methods use anchor-based 2D detection or depth estimation to solve this problem. However, the high computational cost makes these methods difficult to meet real-time performance. In this work, we propose a 3D object detection method using geometric information in stereo images, called Stereo CenterNet. Stereo CenterNet predicts the four semantic key points of the 3D bounding box of the object in space and uses 2D left right boxes, 3D dimension, orientation and key points to restore the bounding box of the object in the 3D space. Then, we use an improved photometric alignment module to further optimize the position of the 3D bounding box. Experiments conducted on the KITTI dataset show that our method achieves the best speed-accuracy trade-off compared with the state-of-the-art methods that without extra required data.
翻译:近年来,基于立体摄像机的三维探测取得了巨大进展,但大多数最先进的方法都使用基于锚的二维探测或深度估计来解决这个问题。然而,高计算成本使得这些方法难以满足实时性能。在这项工作中,我们提议采用立体图像中的三维物体探测方法,称为Stereo CentreNet。 Steo CenterNet预测了空间物体三维捆绑框的四个语义关键点,并使用2D左右框、3D维维度、方向和关键点来恢复三维空间该物体的捆绑框。然后,我们使用改进的光度调整模块来进一步优化三维捆绑盒的位置。在KITTI数据集上进行的实验表明,与不需要额外数据的最先进方法相比,我们的方法实现了最佳速度准确的交换。