Recent advances in generative modeling have substantially enhanced 3D urban generation, enabling applications in digital twins, virtual cities, and large-scale simulations. However, existing methods face two key challenges: (1) the need for large-scale 3D city assets for supervised training, which are difficult and costly to obtain, and (2) reliance on semantic or height maps, which are used exclusively for generating buildings in virtual worlds and lack connection to real-world appearance, limiting the realism and generalizability of generated cities. To address these limitations, we propose Sat2RealCity, a geometry-aware and appearance-controllable framework for 3D urban generation from real-world satellite imagery. Unlike previous city-level generation methods, Sat2RealCity builds generation upon individual building entities, enabling the use of rich priors and pretrained knowledge from 3D object generation while substantially reducing dependence on large-scale 3D city assets. Specifically, (1) we introduce the OSM-based spatial priors strategy to achieve interpretable geometric generation from spatial topology to building instances; (2) we design an appearance-guided controllable modeling mechanism for fine-grained appearance realism and style control; and (3) we construct an MLLM-powered semantic-guided generation pipeline, bridging semantic interpretation and geometric reconstruction. Extensive quantitative and qualitative experiments demonstrate that Sat2RealCity significantly surpasses existing baselines in structural consistency and appearance realism, establishing a strong foundation for real-world aligned 3D urban content creation. The code will be released soon.
翻译:生成建模的最新进展显著提升了三维城市生成能力,使其在数字孪生、虚拟城市和大规模仿真等应用中成为可能。然而,现有方法面临两大关键挑战:(1) 需要大规模三维城市资产进行监督训练,这些数据难以获取且成本高昂;(2) 依赖语义图或高度图,这些数据仅用于虚拟世界中的建筑生成,缺乏与现实世界外观的关联,限制了生成城市的真实感与泛化能力。为突破这些局限,我们提出Sat2RealCity——一个基于真实世界卫星影像、具备几何感知与外观可控特性的三维城市生成框架。与以往城市级生成方法不同,Sat2RealCity以单体建筑实体为基础构建生成过程,既能充分利用三维物体生成领域丰富的先验知识与预训练模型,又大幅降低了对大规模三维城市资产的依赖。具体而言:(1) 我们引入基于开放街道地图(OSM)的空间先验策略,实现从空间拓扑到建筑实例的可解释几何生成;(2) 设计外观引导的可控建模机制,实现细粒度外观真实感与风格控制;(3) 构建基于多模态大语言模型(MLLM)的语义引导生成流程,桥接语义解析与几何重建。大量定量与定性实验表明,Sat2RealCity在结构一致性与外观真实感方面显著超越现有基线方法,为与现实世界对齐的三维城市内容创作奠定了坚实基础。代码即将开源。