MHTTS:针对不完善的自发语音快速多头文字到语音 (MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription)

Neural network based end-to-end Text-to-Speech (TTS) has greatly improved the quality of synthesized speech. While how to use massive spontaneous speech without transcription efficiently still remains an open problem. In this paper, we propose MHTTS, a fast multi-speaker TTS system that is robust to transcription errors and speaking style speech data. Specifically, we introduce a multi-head model and transfer text information from high-quality corpus with manual transcription to spontaneous speech with imperfectly recognized transcription by jointly training them. MHTTS has three advantages: 1) Our system synthesizes better quality multi-speaker voice with faster inference speed. 2) Our system is capable of transferring correct text information to data with imperfect transcription, simulated using corruption, or provided by an Automatic Speech Recogniser (ASR). 3) Our system can utilize massive real spontaneous speech with imperfect transcription and synthesize expressive voice.

翻译：基于神经网络端到端的文本到语音(TTS)大大提高了合成语音的质量。如何高效地使用大规模自发语音而不进行笔录仍然是一个尚未解决的问题。在本文中,我们建议采用快速多发式TTS系统,这是一个快速的多发式TTS系统,对抄录错误和语音风格语音数据具有很强的功能。具体地说,我们引入了多发式模型,并通过联合培训将高品质的文本用手工抄录方式转换成自发语音,而不尽人意的抄录。 MHTTS有三个优点:(1) 我们的系统以更快的推论速度合成质量更好的多发式语音。 (2) 我们的系统能够将正确文本信息转换为不完善的抄录、使用腐败模拟的或由自动语音识别器提供的数据。 (3) 我们的系统可以使用不完善的抄录和合成表达语音的大规模自发式语音。

相关内容

语音合成

关注 0

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

13+阅读 · 2020年5月19日

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

【WSDM2020】小数据学习，124页ppt，Learning with Small Data，宾夕法尼亚州立大学

专知会员服务

134+阅读 · 2020年2月6日