3D Scene Graphs (3DSGs) constitute a powerful representation of the physical world, distinguished by their abilities to explicitly model the complex spatial, semantic, and functional relationships between entities, rendering a foundational understanding that enables agents to interact intelligently with their environment and execute versatile behaviors. Embodied navigation, as a crucial component of such capabilities, leverages the compact and expressive nature of 3DSGs to enable long-horizon reasoning and planning in complex, large-scale environments. However, prior works rely on a static-world assumption, defining traversable space solely based on static spatial layouts and thereby treating interactable obstacles as non-traversable. This fundamental limitation severely undermines their effectiveness in real-world scenarios, leading to limited reachability, low efficiency, and inferior extensibility. To address these issues, we propose HERO, a novel framework for constructing Hierarchical Traversable 3DSGs, that redefines traversability by modeling operable obstacles as pathways, capturing their physical interactivity, functional semantics, and the scene's relational hierarchy. The results show that, relative to its baseline, HERO reduces PL by 35.1% in partially obstructed environments and increases SR by 79.4% in fully obstructed ones, demonstrating substantially higher efficiency and reachability.
翻译:三维场景图(3DSG)作为一种强大的物理世界表示方法,其优势在于能够显式建模实体间复杂的空间、语义与功能关系,从而提供基础性理解,使智能体能够智能地与环境交互并执行多样化行为。具身导航作为此类能力的核心组成部分,利用3DSG的紧凑性与表达力,在复杂的大规模环境中实现长时程推理与规划。然而,现有研究基于静态世界假设,仅依据静态空间布局定义可遍历区域,从而将可交互障碍物视为不可通行。这一根本性局限严重削弱了其在真实场景中的有效性,导致可达性受限、效率低下且扩展性不足。为解决这些问题,本文提出HERO——一种构建分层可遍历三维场景图的新型框架,该框架通过将可操作障碍物建模为通路,重新定义可遍历性,同时捕捉其物理交互性、功能语义及场景的关系层次结构。实验结果表明,相较于基线方法,HERO在部分遮挡环境中将路径长度(PL)降低了35.1%,在完全遮挡环境中将成功率(SR)提升了79.4%,显著提高了导航效率与可达性。