Generating high-fidelity 3D contents remains a fundamental challenge due to the complexity of representing arbitrary topologies-such as open surfaces and intricate internal structures-while preserving geometric details. Prevailing methods based on signed distance fields (SDFs) are hampered by costly watertight preprocessing and struggle with non-manifold geometries, while point-cloud representations often suffer from sampling artifacts and surface discontinuities. To overcome these limitations, we propose a novel 3D variational autoencoder (VAE) framework built upon unsigned distance fields (UDFs)-a more robust and computationally efficient representation that naturally handles complex and incomplete shapes. Our core innovation is a local-to-global (LoG) architecture that processes the UDF by partitioning it into uniform subvolumes, termed UBlocks. This architecture couples 3D convolutions for capturing local detail with sparse transformers for enforcing global coherence. A Pad-Average strategy further ensures smooth transitions at subvolume boundaries during reconstruction. This modular design enables seamless scaling to ultra-high resolutions up to $2048^3$-a regime previously unattainable for 3D VAEs. Experiments demonstrate state-of-the-art performance in both reconstruction accuracy and generative quality, yielding superior surface smoothness and geometric flexibility.
翻译:生成高保真三维内容仍是一项根本性挑战,原因在于表示任意拓扑结构(如开放表面和复杂内部结构)的同时保持几何细节的复杂性。基于有符号距离场(SDF)的主流方法受限于耗时的水密预处理,且难以处理非流形几何,而点云表示常受采样伪影和表面不连续性的影响。为克服这些局限,我们提出了一种基于无符号距离场(UDF)的新型三维变分自编码器(VAE)框架——UDF作为一种更鲁棒且计算高效的表示方法,能自然处理复杂和不完整形状。我们的核心创新是一种局部到全局(LoG)架构,通过将UDF分割为均匀子体积(称为UBlock)进行处理。该架构结合了捕捉局部细节的三维卷积与保证全局一致性的稀疏Transformer。在重建过程中,Pad-Average策略进一步确保了子体积边界处的平滑过渡。这种模块化设计实现了对高达$2048^3$的超高分辨率的无缝扩展——这是三维VAE此前无法达到的领域。实验表明,该方法在重建精度和生成质量上均达到最先进水平,实现了更优的表面平滑度与几何灵活性。