HyPerNav：面向未知环境中目标导向导航的混合感知方法 (HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment)

Objective-oriented navigation(ObjNav) enables robot to navigate to target object directly and autonomously in an unknown environment. Effective perception in navigation in unknown environment is critical for autonomous robots. While egocentric observations from RGB-D sensors provide abundant local information, real-time top-down maps offer valuable global context for ObjNav. Nevertheless, the majority of existing studies focus on a single source, seldom integrating these two complementary perceptual modalities, despite the fact that humans naturally attend to both. With the rapid advancement of Vision-Language Models(VLMs), we propose Hybrid Perception Navigation (HyPerNav), leveraging VLMs' strong reasoning and vision-language understanding capabilities to jointly perceive both local and global information to enhance the effectiveness and intelligence of navigation in unknown environments. In both massive simulation evaluation and real-world validation, our methods achieved state-of-the-art performance against popular baselines. Benefiting from hybrid perception approach, our method captures richer cues and finds the objects more effectively, by simultaneously leveraging information understanding from egocentric observations and the top-down map. Our ablation study further proved that either of the hybrid perception contributes to the navigation performance.

翻译：目标导向导航（ObjNav）使机器人能够在未知环境中直接、自主地导航至目标物体。在未知环境中进行导航时，有效的感知对于自主机器人至关重要。虽然来自RGB-D传感器的以自我为中心的观测提供了丰富的局部信息，而实时俯视图地图则为ObjNav提供了宝贵的全局上下文。然而，现有研究大多集中于单一信息来源，很少整合这两种互补的感知模态，尽管人类天生会同时关注两者。随着视觉语言模型（VLMs）的快速发展，我们提出了混合感知导航（HyPerNav），利用VLM强大的推理和视觉语言理解能力，共同感知局部和全局信息，以提升未知环境中导航的效能与智能水平。在大量仿真评估和真实世界验证中，我们的方法相较于主流基线均取得了最先进的性能。得益于混合感知方法，我们的方法通过同时利用来自以自我为中心的观测和俯视图地图的信息理解，捕获了更丰富的线索，从而更有效地找到目标物体。我们的消融研究进一步证明，混合感知中的任一组成部分均对导航性能有所贡献。