An increasing amount of applications rely on data-driven models that are deployed for perception tasks across a sequence of scenes. Due to the mismatch between training and deployment data, adapting the model on the new scenes is often crucial to obtain good performance. In this work, we study continual multi-scene adaptation for the task of semantic segmentation, assuming that no ground-truth labels are available during deployment and that performance on the previous scenes should be maintained. We propose training a Semantic-NeRF network for each scene by fusing the predictions of a segmentation model and then using the view-consistent rendered semantic labels as pseudo-labels to adapt the model. Through joint training with the segmentation model, the Semantic-NeRF model effectively enables 2D-3D knowledge transfer. Furthermore, due to its compact size, it can be stored in a long-term memory and subsequently used to render data from arbitrary viewpoints to reduce forgetting. We evaluate our approach on ScanNet, where we outperform both a voxel-based baseline and a state-of-the-art unsupervised domain adaptation method.
翻译:由于培训与部署数据不匹配,在新场景上调整模型对于取得良好业绩往往至关重要。在这项工作中,我们研究对语义分解任务进行连续多层适应,假设部署期间没有地面实况标签,而且应保持先前场景的性能。我们提议为每个场景培训一个语义-NERF网络,方法是对分化模型的预测进行拼凑,然后使用视觉一致的语义标签作为假标签来调整模型。通过与分解模型的联合培训,Semantic-NERF模型有效地促成了2D-3D知识的转让。此外,由于它的缩缩小,它可以储存在长期的记忆中,随后用来从任意角度提供数据来减少遗忘。我们评估了扫描网络的方法,我们在该网络上超越了基于 voxel 的基线和状态的不受监督的域适应方法。