BEVDET: 鸟眼观察中高性能多镜头3D物体探测 (BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View)

Autonomous driving perceives its surroundings for decision making, which is one of the most complex scenarios in visual perception. The success of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, feasible, and scalable paradigm for fundamentally pushing the performance boundary in this area. To this end, we contribute the BEVDet paradigm in this paper. BEVDet performs 3D object detection in Bird-Eye-View (BEV), where most target values are defined and route planning can be handily performed. We merely reuse existing modules to build its framework but substantially develop its performance by constructing an exclusive data augmentation strategy and upgrading the Non-Maximum Suppression strategy. In the experiment, BEVDet offers an excellent trade-off between accuracy and time-efficiency. As a fast version, BEVDet-Tiny scores 31.2% mAP and 39.2% NDS on the nuScenes val set. It is comparable with FCOS3D, but requires just 11% computational budget of 215.3 GFLOPs and runs 9.2 times faster at 15.6 FPS. Another high-precision version dubbed BEVDet-Base scores 39.3% mAP and 47.2% NDS, significantly exceeding all published results. With a comparable inference speed, it surpasses FCOS3D by a large margin of +9.8% mAP and +10.0% NDS. The source code is publicly available for further research at https://github.com/HuangJunJie2017/BEVDet .

翻译：自主驱动能感觉到决策的周围环境,这是视觉认知中最复杂的情景之一。在解决 2D 对象探测任务方面,范式创新的成功激励我们寻找一个优雅、可行和可扩展的范式,从根本上推展这个区域的业绩界限。为此,我们贡献了本文中的BEVDet范式。BEVDet在Bird-Eye-View(BEV)中执行3D对象探测,其中大多数目标值都是确定的,路线规划可以顺利完成。我们只是重新利用现有的模块来构建框架,但通过构建一个独家数据增强战略和升级非Meximum 目标探测战略来大大发展其绩效。在实验中,BEVDDet提供了精准和时间效率之间的极佳平衡。作为快速版本,BEVDDED-T-Tiny分数为31.2% mAP和39.2% NDS。它与 FCOSS3D的计算预算只有11%,在15.6 FPSDS 上运行9.2,另一个高PS+MDS 版本的高级读取率为393。