InstDrive：面向驾驶场景的实例感知三维高斯泼溅 (InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes)

Reconstructing dynamic driving scenes from dashcam videos has attracted increasing attention due to its significance in autonomous driving and scene understanding. While recent advances have made impressive progress, most methods still unify all background elements into a single representation, hindering both instance-level understanding and flexible scene editing. Some approaches attempt to lift 2D segmentation into 3D space, but often rely on pre-processed instance IDs or complex pipelines to map continuous features to discrete identities. Moreover, these methods are typically designed for indoor scenes with rich viewpoints, making them less applicable to outdoor driving scenarios. In this paper, we present InstDrive, an instance-aware 3D Gaussian Splatting framework tailored for the interactive reconstruction of dynamic driving scene. We use masks generated by SAM as pseudo ground-truth to guide 2D feature learning via contrastive loss and pseudo-supervised objectives. At the 3D level, we introduce regularization to implicitly encode instance identities and enforce consistency through a voxel-based loss. A lightweight static codebook further bridges continuous features and discrete identities without requiring data pre-processing or complex optimization. Quantitative and qualitative experiments demonstrate the effectiveness of InstDrive, and to the best of our knowledge, it is the first framework to achieve 3D instance segmentation in dynamic, open-world driving scenes.More visualizations are available at our project page.

翻译：从行车记录仪视频重建动态驾驶场景因其在自动驾驶和场景理解中的重要性而日益受到关注。尽管近期研究取得了显著进展，但大多数方法仍将所有背景元素统一为单一表示，这既阻碍了实例级理解，也限制了灵活的场景编辑。一些方法尝试将二维分割提升至三维空间，但通常依赖于预处理的实例ID或复杂的流程来将连续特征映射到离散身份。此外，这些方法通常针对视角丰富的室内场景设计，使其不太适用于室外驾驶场景。本文提出InstDrive，一个专为动态驾驶场景交互式重建定制的实例感知三维高斯泼溅框架。我们使用SAM生成的掩码作为伪真值，通过对比损失和伪监督目标指导二维特征学习。在三维层面，我们引入正则化来隐式编码实例身份，并通过基于体素的损失强制一致性。一个轻量级的静态码本进一步桥接了连续特征与离散身份，无需数据预处理或复杂优化。定量与定性实验证明了InstDrive的有效性；据我们所知，这是首个在动态、开放世界驾驶场景中实现三维实例分割的框架。更多可视化结果请参见项目页面。