Reproducible closed-loop evaluation remains a major bottleneck in Embodied AI such as visual navigation. A promising path forward is high-fidelity simulation that combines photorealistic sensor rendering with geometrically grounded interaction in complex, open-world urban environments. Although recent video-3DGS methods ease open-world scene capturing, they are still unsuitable for benchmarking due to large visual and geometric sim-to-real gaps. To address these challenges, we introduce Wanderland, a real-to-sim framework that features multi-sensor capture, reliable reconstruction, accurate geometry, and robust view synthesis. Using this pipeline, we curate a diverse dataset of indoor-outdoor urban scenes and systematically demonstrate how image-only pipelines scale poorly, how geometry quality impacts novel view synthesis, and how all of these adversely affect navigation policy learning and evaluation reliability. Beyond serving as a trusted testbed for embodied navigation, Wanderland's rich raw sensor data further allows benchmarking of 3D reconstruction and novel view synthesis models. Our work establishes a new foundation for reproducible research in open-world embodied AI. Project website is at https://ai4ce.github.io/wanderland/.
翻译:在视觉导航等具身人工智能领域,可复现的闭环评估仍是一个主要瓶颈。一个前景广阔的解决路径是构建高保真仿真环境,将逼真的传感器渲染与复杂开放世界城市环境中的几何基础交互相结合。尽管近期的视频-3D高斯泼溅方法降低了开放世界场景采集的难度,但由于视觉与几何层面存在显著的仿真与现实差异,它们仍不适用于基准测试。为应对这些挑战,我们提出了Wanderland——一个基于真实到仿真框架的系统,具备多传感器采集、可靠重建、精确几何建模与鲁棒视图合成能力。通过该流程,我们构建了一个涵盖室内外城市场景的多样化数据集,并系统性地论证了纯图像流程的可扩展性局限、几何质量对新颖视图合成的影响,以及这些因素如何共同损害导航策略学习与评估的可靠性。除作为具身导航的可信测试平台外,Wanderland丰富的原始传感器数据还可用于三维重建与新颖视图合成模型的基准评估。本研究为开放世界具身人工智能的可复现研究奠定了新基础。项目网站详见 https://ai4ce.github.io/wanderland/。