Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents in existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this task happens in 3D space, a 3D-aware agent can advance its ObjectNav capability via learning from fine-grained spatial information. However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost. In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation. Through extensive experiments, we show that this framework can dramatically improve the performance in ObjectNav through learning from 3D scene representation. Our framework achieves the best performance among all modular-based methods on the Matterport3D and Gibson datasets, while requiring (up to 30x) less computational cost for training.
翻译:不可见环境中的天体导航(ObjectNav)是不可见环境中的不可见物体目标导航(ObjectNav)的基本任务。 现有工程的代理人学习基于 2D 地图、 场景图或图像序列的3D 目标Nav 政策。 考虑到这项任务发生在 3D 空间, 3D- waware 代理人可以通过从细微的空间信息学习来提升其天体Nav 能力。 但是, 3D 场景显示作用对于这一底层任务的政策学习来说可能是极其不切实际的, 因为样本效率低且计算成本昂贵。 在这项工作中, 我们提议了一个具有挑战性的 3D 3D 和 Gison 数据集3D 3D 的3D 模块化方法框架。 两个次级警察, 即角导探索政策和类别识别政策, 同时使用在线连接的 3D 点作为观察, 来进行操作。 我们通过广泛的实验, 显示这个框架可以通过 3D 场景表现来极大地改进天体内的工作表现。 我们的框架在 3D 3D 和 Gimson 数据集的所有模块化方法中取得了最佳的绩效, 同时要求( 30x) 低于计算成本。