Recent advances in 3D sensing have created unique challenges for computer vision. One fundamental challenge is finding a good representation for 3D sensor data. Most popular representations (such as PointNet) are proposed in the context of processing truly 3D data (e.g. points sampled from mesh models), ignoring the fact that 3D sensored data such as a LiDAR sweep is in fact 2.5D. We argue that representing 2.5D data as collections of (x, y, z) points fundamentally destroys hidden information about freespace. In this paper, we demonstrate such knowledge can be efficiently recovered through 3D raycasting and readily incorporated into batch-based gradient learning. We describe a simple approach to augmenting voxel-based networks with visibility: we add a voxelized visibility map as an additional input stream. In addition, we show that visibility can be combined with two crucial modifications common to state-of-the-art 3D detectors: synthetic data augmentation of virtual objects and temporal aggregation of LiDAR sweeps over multiple time frames. On the NuScenes 3D detection benchmark, we show that, by adding an additional stream for visibility input, we can significantly improve the overall detection accuracy of a state-of-the-art 3D detector.
翻译:3D遥感的最近进展为计算机视野带来了独特的挑战。 一个根本性的挑战就是找到3D传感器数据的良好代表。 多数受欢迎的代表(如PointNet)都是在处理真正3D数据的背景下提出的(例如从网状模型抽样的点),忽视了3D传感器数据(如LiDAR扫瞄)事实上是2.5D。 我们争辩说,将2.5D数据作为收集(x,y,z)点的收集(x,y,z)点,从根本上摧毁了关于自由空间的隐蔽信息。在本文中,我们证明这种知识可以通过3D射线观测有效恢复,并随时纳入批量基梯度学习中。我们描述了一种简单的方法,用可见度来扩大基于oxel的网络:我们添加一个反毒化可见度地图作为额外的输入流。此外,我们表明,能度可以与最先进的3D探测器(x,即虚拟物体的合成数据增强和LIDAR扫荡时间框架的时空汇总)的两种常见的关键修改相结合。 在 Nuscenes 3D探测基准上,我们表明,通过添加更多的可见度输入流来大大改进整个探测状态。