Three-dimensional scene generation holds significant potential in gaming, film, and virtual reality. However, most existing methods adopt a single-step generation process, making it difficult to balance scene complexity with minimal user input. Inspired by the human cognitive process in scene modeling, which progresses from global to local, focuses on key elements, and completes the scene through semantic association, we propose HiGS, a hierarchical generative framework for multi-step associative semantic spatial composition. HiGS enables users to iteratively expand scenes by selecting key semantic objects, offering fine-grained control over regions of interest while the model completes peripheral areas automatically. To support structured and coherent generation, we introduce the Progressive Hierarchical Spatial-Semantic Graph (PHiSSG), which dynamically organizes spatial relationships and semantic dependencies across the evolving scene structure. PHiSSG ensures spatial and geometric consistency throughout the generation process by maintaining a one-to-one mapping between graph nodes and generated objects and supporting recursive layout optimization. Experiments demonstrate that HiGS outperforms single-stage methods in layout plausibility, style consistency, and user preference, offering a controllable and extensible paradigm for efficient 3D scene construction.
翻译:三维场景生成在游戏、电影和虚拟现实领域具有重要潜力。然而,现有方法大多采用单步生成流程,难以在场景复杂性与最小化用户输入之间取得平衡。受人类场景建模认知过程(从全局到局部、聚焦关键要素、通过语义关联完成场景)的启发,我们提出了HiGS——一种用于多步关联语义空间组合的层次化生成框架。HiGS允许用户通过选择关键语义对象迭代扩展场景,在模型自动补全周边区域的同时,实现对感兴趣区域的细粒度控制。为支持结构化且连贯的生成过程,我们提出了渐进式层次化空间-语义图(PHiSSG),该图能动态组织演化场景结构中的空间关系与语义依赖。PHiSSG通过维持图节点与生成对象间的一一映射关系,并支持递归布局优化,确保生成全过程的空间与几何一致性。实验表明,HiGS在布局合理性、风格一致性和用户偏好方面均优于单阶段方法,为高效三维场景构建提供了可控且可扩展的范式。