Despite major advances brought by diffusion-based models, current 3D texture generation systems remain hindered by cross-view inconsistency -- textures that appear convincing from one viewpoint often fail to align across others. We find that this issue arises from attention ambiguity, where unstructured full attention is applied indiscriminately across tokens and modalities, causing geometric confusion and unstable appearance-structure coupling. To address this, we introduce CaliTex, a framework of geometry-calibrated attention that explicitly aligns attention with 3D structure. It introduces two modules: Part-Aligned Attention that enforces spatial alignment across semantically matched parts, and Condition-Routed Attention which routes appearance information through geometry-conditioned pathways to maintain spatial fidelity. Coupled with a two-stage diffusion transformer, CaliTex makes geometric coherence an inherent behavior of the network rather than a byproduct of optimization. Empirically, CaliTex produces seamless and view-consistent textures and outperforms both open-source and commercial baselines.
翻译:尽管基于扩散模型的进展显著,当前的三维纹理生成系统仍受限于跨视角不一致性问题——从单一视角看似合理的纹理在其他视角下往往无法对齐。我们发现该问题源于注意力模糊性,即非结构化的全局注意力被不加区分地应用于不同标记与模态之间,导致几何混淆与外观-结构耦合的不稳定。为解决此问题,我们提出CaliTex框架,通过几何校准的注意力机制显式地将注意力与三维结构对齐。该框架包含两个核心模块:部件对齐注意力模块,强制语义匹配部件间的空间对齐;以及条件路由注意力模块,通过几何条件路径引导外观信息以保持空间保真度。结合两阶段扩散Transformer架构,CaliTex使几何一致性成为网络的内在特性而非优化的副产品。实验表明,CaliTex能生成无缝且视角一致的纹理,在开源与商业基线方法中均表现出优越性能。