想象声音:文字到语音的面对面传播模型</s> (Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech)

The goal of this work is zero-shot text-to-speech synthesis, with speaking styles and voices learnt from facial characteristics. Inspired by the natural fact that people can imagine the voice of someone when they look at his or her face, we introduce a face-styled diffusion text-to-speech (TTS) model within a unified framework learnt from visible attributes, called Face-TTS. This is the first time that face images are used as a condition to train a TTS model. We jointly train cross-model biometrics and TTS models to preserve speaker identity between face images and generated speech segments. We also propose a speaker feature binding loss to enforce the similarity of the generated and the ground truth speech segments in speaker embedding space. Since the biometric information is extracted directly from the face image, our method does not require extra fine-tuning steps to generate speech from unseen and unheard speakers. We train and evaluate the model on the LRS3 dataset, an in-the-wild audio-visual corpus containing background noise and diverse speaking styles. The project page is https://facetts.github.io.

翻译：这项工作的目标是零光文本到语音合成,其语言风格和声音从面部特征中学习。受人们能够想象某人在看着其脸部时的声音这一自然事实的启发,我们在一个从可见属性中学习的统一框架内引入了面型扩散文本到语音模型(TTS)。这是第一次将脸部图像用作培训TTS模型的条件。我们联合培训跨模版生物鉴别学和TTS模型,以在脸部图像和生成的语音部分之间保护发言者身份。我们还提出一个带主语特征的束缚性损失,以强化发言者嵌入的空间中生成的和地面真实语言部分的相似性。由于生物鉴别信息直接从脸部图像中提取,我们的方法不需要额外的微调步骤来生成看不见和听不到的发言者的演讲。我们在LRS3数据集上培训和评价模型,这是一个包含背景噪音和多种语音风格的动态视听材料。项目页面是 https://pacets.github.io。</s>

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日