We demonstrate how the often overlooked inherent properties of large-scale LiDAR point clouds can be effectively utilized for self-supervised representation learning. In pursuit of this goal, we design a highly data-efficient feature pre-training backbone that considerably reduces the need for tedious 3D annotations to train state-of-the-art object detectors. We propose Masked AutoEncoder for LiDAR point clouds (MAELi) that intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction. Our approach results in more expressive and useful features, which can be directly applied to downstream perception tasks, such as 3D object detection for autonomous driving. In a novel reconstruction schema, MAELi distinguishes between free and occluded space and employs a new masking strategy that targets the LiDAR's inherent spherical projection. To demonstrate the potential of MAELi, we pre-train one of the most widely-used 3D backbones in an end-to-end manner and show the effectiveness of our unsupervised pre-trained features on various 3D object detection architectures. Our method achieves significant performance improvements when only a small fraction of labeled frames is available for fine-tuning object detectors. For instance, with ~800 labeled frames, MAELi features enhance a SECOND model by +10.79APH/LEVEL 2 on Waymo Vehicles.
翻译:我们展示了如何有效利用大规模LiDAR点云的常被忽略的内在特性进行自监督特征学习。为了实现这个目标,我们设计了一个高效的数据预训练骨干网络,大大减少了训练最先进的目标检测器所需的琐碎三维标注。我们提出了适用于LiDAR点云的遮蔽自编码器(MAELi),在编码器和解码器在重构过程中直观地利用了LiDAR点云的稀疏性。我们的方法产生了更具表现力和实用性的特征,可以直接应用于下游感知任务,例如自动驾驶中的三维物体检测。在一个新颖的重构模式中,MAELi区分了自由和遮挡空间,并采用了针对LiDAR固有的球面投影的新的掩蔽策略。为了证明MAELi的潜力,我们端到端地预训练了其中最广泛使用的三维骨干网络,展示了我们的无监督预训练特征在各种三维物体检测架构上的有效性。当只有少量标记帧可用于微调目标检测器时,我们的方法实现了显著的性能提升。例如,用约800个标记帧,MAELi特征在 Waymo Vehicles 上将一个SECOND模型提高了 +10.79APH/LEVEL 2。