We propose a novel method for instance label segmentation of dense 3D voxel grids. We target volumetric scene representations, which have been acquired with depth sensors or multi-view stereo methods and which have been processed with semantic 3D reconstruction or scene completion methods. The main task is to learn shape information about individual object instances in order to accurately separate them, including connected and incompletely scanned objects. We solve the 3D instance-labeling problem with a multi-task learning strategy. The first goal is to learn an abstract feature embedding, which groups voxels with the same instance label close to each other while separating clusters with different instance labels from each other. The second goal is to learn instance information by densely estimating directional information of the instance's center of mass for each voxel. This is particularly useful to find instance boundaries in the clustering post-processing step, as well as, for scoring the segmentation quality for the first goal. Both synthetic and real-world experiments demonstrate the viability and merits of our approach. In fact, it achieves state-of-the-art performance on the ScanNet 3D instance segmentation benchmark.
翻译:我们建议一种新型方法, 以密度 3D voxel 网格的标签分割为例。 我们的目标是用深度传感器或多视图立体器方法获得的体积场景演示, 并且用语义 3D 重建或场景完成方法进行处理。 主要任务是了解单个物体实例的形状信息, 以便准确区分它们, 包括连接的和不完整的扫描对象 。 我们用多任务学习策略来解决三D 例标签问题 。 第一个目标是学习一个抽象的嵌入特征, 将相同实例的氧化物分组贴上彼此的标签, 同时将不同实例的标签分开 。 第二个目标是通过对每个 voxel 的场景中心进行快速估计方向信息来学习实例信息 。 这特别有助于在集成后处理步骤中找到实例的界限, 并且为第一个目标的分解质量评分数。 合成和真实世界的实验都证明了我们的方法的可行性和优点。 事实上, 它在扫描网络 3D 区段基准上取得了最先进的性表现。