Neural Radiance Fields (NeRF) have recently demonstrated photo-realistic results for the task of novel view synthesis. In this paper, we propose to apply novel view synthesis to the robot relocalization problem: we demonstrate improvement of camera pose regression thanks to an additional synthetic dataset rendered by the NeRF class of algorithm. To avoid spawning novel views in irrelevant places we selected virtual camera locations from NeRF internal representation of the 3D geometry of the scene. We further improved localization accuracy of pose regressors using synthesized realistic and geometry consistent images as data augmentation during training. At the time of publication, our approach improved state of the art with a 60% lower error on Cambridge Landmarks and 7-scenes datasets. Hence, the resulting accuracy becomes comparable to structure-based methods, without any architecture modification or domain adaptation constraints. Since our method allows almost infinite generation of training data, we investigated limitations of camera pose regression depending on size and distribution of data used for training on public benchmarks. We concluded that pose regression accuracy is mostly bounded by relatively small and biased datasets rather than capacity of the pose regression model to solve the localization task.
翻译:神经辐射场( NeRF) 最近展示了新视图合成任务的照片现实效果。 在本文中, 我们提议对机器人重新定位问题应用新观点合成: 由于 NeRF 算法类的额外合成数据集, 我们展示了相机的改进后回归。 为了避免在不相关的地方产生新观点, 我们从 NeRF 3D 场景的3D 几何内部代表中选择了虚拟相机位置。 我们用综合的、现实的和地理测量一致的图像作为培训期间的数据增强, 进一步提高了组合回回归器的本地化准确性。 在出版时, 我们的方法改进了艺术状态, 使剑桥地标和7摄像数据集的误差降低了60% 。 因此, 由此产生的准确性变得与基于结构的方法相似, 没有任何架构的修改或域的调整限制。 由于我们的方法允许几乎无限地生成培训数据, 我们调查了相机的局限性, 取决于用于公共基准培训的数据的大小和分布。 我们的结论是, 造成回回归准确性大部分受相对小且有偏差的数据集的限制, 而不是配置回归模型解决本地化任务的能力。