Learning to navigate in dynamic and complex open-world environments is a critical yet challenging capability for autonomous robots. Existing approaches often rely on cascaded modular frameworks, which require extensive hyperparameter tuning or learning from limited real-world demonstration data. In this paper, we propose Navigation Diffusion Policy (NavDP), an end-to-end network trained solely in simulation that enables zero-shot sim-to-real transfer across diverse environments and robot embodiments. The core of NavDP is a unified transformer-based architecture that jointly learns trajectory generation and trajectory evaluation, both conditioned solely on local RGB-D observation. By learning to predict critic values for contrastive trajectory samples, our proposed approach effectively leverages supervision from privileged information available in simulation, thereby fostering accurate spatial understanding and enabling the distinction between safe and dangerous behaviors. To support this, we develop an efficient data generation pipeline in simulation and construct a large-scale dataset encompassing over one million meters of navigation experience across 3,000 scenes. Empirical experiments in both simulated and real-world environments demonstrate that NavDP significantly outperforms prior state-of-the-art methods. Furthermore, we identify key factors influencing the generalization performance of NavDP. The dataset and code are publicly available at https://wzcai99.github.io/navigation-diffusion-policy.github.io.
翻译:在动态复杂的开放世界环境中学习导航是自主机器人关键但具有挑战性的能力。现有方法通常依赖于级联模块化框架,这需要大量超参数调优或从有限的真实世界演示数据中学习。本文提出导航扩散策略(NavDP),一种完全在仿真中训练的端到端网络,能够实现跨多样化环境和机器人形态的零样本仿真到真实迁移。NavDP的核心是一个统一的基于Transformer的架构,该架构联合学习轨迹生成与轨迹评估,两者仅以局部RGB-D观测为条件。通过学习预测对比轨迹样本的评价值,我们提出的方法有效利用了仿真中可用的特权信息监督,从而促进准确的空间理解,并能够区分安全与危险行为。为此,我们在仿真中开发了高效的数据生成流程,并构建了一个大规模数据集,涵盖超过100万米导航经验,覆盖3,000个场景。在仿真和真实环境中的实证实验表明,NavDP显著优于先前的最先进方法。此外,我们识别了影响NavDP泛化性能的关键因素。数据集与代码公开于 https://wzcai99.github.io/navigation-diffusion-policy.github.io。