We propose a novel deep visual odometry (VO) method that considers global information by selecting memory and refining poses. Existing learning-based methods take the VO task as a pure tracking problem via recovering camera poses from image snippets, leading to severe error accumulation. Global information is crucial for alleviating accumulated errors. However, it is challenging to effectively preserve such information for end-to-end systems. To deal with this challenge, we design an adaptive memory module, which progressively and adaptively saves the information from local to global in a neural analogue of memory, enabling our system to process long-term dependency. Benefiting from global information in the memory, previous results are further refined by an additional refining module. With the guidance of previous outputs, we adopt a spatial-temporal attention to select features for each view based on the co-visibility in feature domain. Specifically, our architecture consisting of Tracking, Remembering and Refining modules works beyond tracking. Experiments on the KITTI and TUM-RGBD datasets demonstrate that our approach outperforms state-of-the-art methods by large margins and produces competitive results against classic approaches in regular scenes. Moreover, our model achieves outstanding performance in challenging scenarios such as texture-less regions and abrupt motions, where classic algorithms tend to fail.
翻译:我们提出一种新的深视测量方法,通过选择记忆和精炼配置来考虑全球信息。现有基于学习的方法将VO任务视为通过恢复图像片片片的照相机,进行纯跟踪的问题,从而导致严重错误积累。全球信息对于减轻累积错误至关重要。然而,要有效地为端对端系统保存这类信息,则具有挑战性。为了应对这一挑战,我们设计了一个适应性记忆模块,在记忆的神经类比中,逐步和适应性地将信息从地方到全球保存起来,使我们的系统能够处理长期依赖性。从全球信息中受益,通过一个额外的精炼模块进一步完善先前的成果。在以往产出的指导下,我们采取空间时钟关注,根据地域的共视度为每种观点选择特征。具体地说,我们由跟踪、记忆和精细模块组成的适应性记忆模块无法跟踪。 KITTI 和 TUM- RGBBD 数据集的实验表明,我们的方法通过大边距宽度的模型超越了现状,并产生竞争性的结果,以对抗典型的文本周期性,在典型的状态下,在典型的状态下,在典型的状态下,在典型的状态下,可以实现。