Recent multimodal fusion methods, integrating images with LiDAR point clouds, have shown promise in scene flow estimation. However, the fusion of 4D millimeter wave radar and LiDAR remains unexplored. Unlike LiDAR, radar is cheaper, more robust in various weather conditions and can detect point-wise velocity, making it a valuable complement to LiDAR. However, radar inputs pose challenges due to noise, low resolution, and sparsity. Moreover, there is currently no dataset that combines LiDAR and radar data specifically for scene flow estimation. To address this gap, we construct a Radar-LiDAR scene flow dataset based on a public real-world automotive dataset. We propose an effective preprocessing strategy for radar denoising and scene flow label generation, deriving more reliable flow ground truth for radar points out of the object boundaries. Additionally, we introduce RaLiFlow, the first joint scene flow learning framework for 4D radar and LiDAR, which achieves effective radar-LiDAR fusion through a novel Dynamic-aware Bidirectional Cross-modal Fusion (DBCF) module and a carefully designed set of loss functions. The DBCF module integrates dynamic cues from radar into the local cross-attention mechanism, enabling the propagation of contextual information across modalities. Meanwhile, the proposed loss functions mitigate the adverse effects of unreliable radar data during training and enhance the instance-level consistency in scene flow predictions from both modalities, particularly for dynamic foreground areas. Extensive experiments on the repurposed scene flow dataset demonstrate that our method outperforms existing LiDAR-based and radar-based single-modal methods by a significant margin.
翻译:近年来,融合图像与激光雷达点云的多模态方法在场景流估计中展现出潜力。然而,4D毫米波雷达与激光雷达的融合仍未被探索。与激光雷达相比,雷达成本更低、在各种天气条件下更鲁棒,且能检测逐点速度,使其成为激光雷达的有价值补充。但雷达输入因噪声、低分辨率和稀疏性带来挑战。此外,目前尚无专门针对场景流估计结合激光雷达与雷达数据的公开数据集。为填补这一空白,我们基于公开的真实世界自动驾驶数据集构建了一个雷达-激光雷达场景流数据集。我们提出了一种有效的雷达去噪与场景流标签生成预处理策略,为物体边界外的雷达点推导出更可靠的流真值。同时,我们提出了RaLiFlow——首个面向4D雷达与激光雷达的联合场景流学习框架,该框架通过新颖的动态感知双向跨模态融合模块和精心设计的损失函数集实现有效的雷达-激光雷达融合。DBCF模块将雷达的动态线索整合到局部交叉注意力机制中,实现跨模态的上下文信息传播。与此同时,所提出的损失函数在训练中缓解了不可靠雷达数据的不利影响,并增强了两种模态在场景流预测中的实例级一致性,特别是针对动态前景区域。在重构的场景流数据集上的大量实验表明,我们的方法显著优于现有的基于激光雷达和基于雷达的单模态方法。