Face editing methods, essential for tasks like virtual avatars, digital human synthesis and identity preservation, have traditionally been built upon GAN-based techniques, while recent focus has shifted to diffusion-based models due to their success in image reconstruction. However, diffusion models still face challenges in controlling specific attributes and preserving the consistency of other unchanged attributes especially the identity characteristics. To address these issues and facilitate more convenient editing of face images, we propose a novel approach that leverages the power of Stable-Diffusion (SD) models and crude 3D face models to control the lighting, facial expression and head pose of a portrait photo. We observe that this task essentially involves the combinations of target background, identity and face attributes aimed to edit. We strive to sufficiently disentangle the control of these factors to enable consistency of face editing. Specifically, our method, coined as RigFace, contains: 1) A Spatial Attribute Encoder that provides presise and decoupled conditions of background, pose, expression and lighting; 2) A high-consistency FaceFusion method that transfers identity features from the Identity Encoder to the denoising UNet of a pre-trained SD model; 3) An Attribute Rigger that injects those conditions into the denoising UNet. Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models.
翻译:人脸编辑方法在虚拟化身、数字人合成与身份保持等任务中至关重要,传统方法主要基于生成对抗网络(GAN)技术,而近期研究焦点已转向扩散模型,因其在图像重建方面取得的显著成功。然而,扩散模型在控制特定属性及保持其他未修改属性(尤其是身份特征)的一致性方面仍面临挑战。为解决这些问题并实现更便捷的人脸图像编辑,本文提出一种创新方法,该方法结合稳定扩散(Stable-Diffusion, SD)模型与粗略三维人脸模型,以控制肖像照片的光照、面部表情与头部姿态。我们观察到该任务本质上涉及目标背景、身份及待编辑面部属性的组合。我们致力于充分解耦这些因素的控制,以实现人脸编辑的一致性。具体而言,我们提出的方法(命名为RigFace)包含:1)空间属性编码器,提供背景、姿态、表情与光照的精确解耦条件;2)高一致性人脸融合方法,将身份编码器提取的身份特征迁移至预训练SD模型的去噪UNet中;3)属性驱动模块,将这些条件注入去噪UNet。与现有人脸编辑模型相比,我们的模型在身份保持与照片真实感方面均达到相当甚至更优的性能。