Despite the recent success of multi-view diffusion models for text/image-based 3D asset generation, instruction-based editing of 3D assets lacks surprisingly far behind the quality of generation models. The main reason is that recent approaches using 2D priors suffer from view-inconsistent editing signals. Going beyond 2D prior distillation methods and multi-view editing strategies, we propose a training-free editing method that operates within the latent space of a native 3D diffusion model, allowing us to directly manipulate 3D geometry. We guide the edit synthesis by blending 3D attention maps from the generation with the source object. Coupled with geometry-aware regularization guidance, a spectral modulation strategy in the Fourier domain and a refinement step for 3D enhancement, our method outperforms previous 3D editing methods enabling high-fidelity and precise edits across a wide range of shapes and semantic manipulations. Our project webpage is https://mparelli.github.io/3d-latte
翻译:尽管近年来基于文本/图像的多视角扩散模型在三维资产生成方面取得了显著成功,但基于指令的三维资产编辑质量却明显落后于生成模型。主要原因在于,当前利用二维先验的方法受到视角不一致编辑信号的制约。我们提出了一种免训练的编辑方法,该方法超越了二维先验蒸馏与多视角编辑策略,直接在原生三维扩散模型的潜在空间内操作,从而实现对三维几何结构的直接操控。我们通过融合生成过程中的三维注意力图与源对象来引导编辑合成。结合几何感知的正则化指导、傅里叶域中的频谱调制策略以及用于三维增强的细化步骤,本方法在多种形状和语义操作中均实现了高保真度与精确编辑,性能优于先前三维编辑方法。项目网页地址为 https://mparelli.github.io/3d-latte