维护身份-维护现实主义对话代代人 (Identity-Preserving Realistic Talking Face Generation)

Speech-driven facial animation is useful for a variety of applications such as telepresence, chatbots, etc. The necessary attributes of having a realistic face animation are 1) audio-visual synchronization (2) identity preservation of the target individual (3) plausible mouth movements (4) presence of natural eye blinks. The existing methods mostly address the audio-visual lip synchronization, and few recent works have addressed the synthesis of natural eye blinks for overall video realism. In this paper, we propose a method for identity-preserving realistic facial animation from speech. We first generate person-independent facial landmarks from audio using DeepSpeech features for invariance to different voices, accents, etc. To add realism, we impose eye blinks on facial landmarks using unsupervised learning and retargets the person-independent landmarks to person-specific landmarks to preserve the identity-related facial structure which helps in the generation of plausible mouth shapes of the target identity. Finally, we use LSGAN to generate the facial texture from person-specific facial landmarks, using an attention mechanism that helps to preserve identity-related texture. An extensive comparison of our proposed method with the current state-of-the-art methods demonstrates a significant improvement in terms of lip synchronization accuracy, image reconstruction quality, sharpness, and identity-preservation. A user study also reveals improved realism of our animation results over the state-of-the-art methods. To the best of our knowledge, this is the first work in speech-driven 2D facial animation that simultaneously addresses all the above-mentioned attributes of a realistic speech-driven face animation.

翻译：语音驱动的面部动画对于远程现场、聊天机等各种应用非常有用。具有现实面部动画等必要属性,其必要特征包括:(1) 视听同步;(2) 目标个人的身份保护;(3) 貌似口腔运动;(4) 自然眨眼;现有方法大多针对视听嘴唇同步,而最近的作品很少涉及自然眼睛闪烁的合成,以全面视频现实主义为目的。在本文中,我们提出一种从演讲中保留真实面部动画的方法。我们首先利用深思面部功能从听音中产生个人独立的面部标志,以不同声音、口音等。为了增加现实主义,我们用不受监督的学习,对面部标志进行眼睛眨眼;(3) 自然口腔运动;(2) 将个人独立标志重新定位为个人特定标志,以保护与整个视频现实真实面部的口部结构。最后,我们使用LSGAN从个人特定面部面部的面部动画图中生成面部纹,利用有助于保存与身份相关的文字、口音等。为了增加真实性,我们提出的真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性研究,以及当前性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性研究、真实性研究、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、真实性、