R-CNN基于立体3D的立体立体R-CNN立体立体自动驱动 (Stereo R-CNN based 3D Object Detection for Autonomous Driving)

We propose a 3D object detection method for autonomous driving by fully exploiting the sparse and dense, semantic and geometry information in stereo imagery. Our method, called Stereo R-CNN, extends Faster R-CNN for stereo inputs to simultaneously detect and associate object in left and right images. We add extra branches after stereo Region Proposal Network (RPN) to predict sparse keypoints, viewpoints, and object dimensions, which are combined with 2D left-right boxes to calculate a coarse 3D object bounding box. We then recover the accurate 3D bounding box by a region-based photometric alignment using left and right RoIs. Our method does not require depth input and 3D position supervision, however, outperforms all existing fully supervised image-based methods. Experiments on the challenging KITTI dataset show that our method outperforms the state-of-the-art stereo-based method by around 30% AP on both 3D detection and 3D localization tasks. Code will be made publicly available.

翻译：我们建议了一种3D物体自动驾驶探测方法,充分利用立体图像中的稀薄和稠密、语义和几何信息。我们的方法叫做Stereo R-CNN, 将立体输入的更快R-CNN扩展为立体输入, 以同时检测和连接左面和右面图像中的物体。我们在立体区域建议网络之后增加了额外的分支, 以预测稀疏的键点、观点和物体尺寸, 与 2D 左侧框相结合, 以计算粗糙的 3D 物体捆绑盒。然后我们用以区域为基础的光度校准, 利用左面和右面的RoIs 来回收精确的 3D 捆绑框。我们的方法不需要深度输入和 3D 定位监督, 但是, 优于所有现有的完全受监督的基于图像的方法。在挑战的 KITTI 数据集上进行的实验显示, 我们的方法在3D 探测和 3D 本地化任务中, 大约 30% AP 都比基于立体立体法。

相关内容

R-CNN

关注 26

R-CNN的全称是Region-CNN，它可以说是是第一个成功将深度学习应用到目标检测上的算法。传统的目标检测方法大多以图像识别为基础。一般可以在图片上使用穷举法选出所所有物体可能出现的区域框，对这些区域框提取特征并使用图像识别方法分类，得到所有分类成功的区域后,通过非极大值抑制(Non-maximumsuppression)输出结果。

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

39+阅读 · 2020年3月19日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日