平行神经 TTS 的强调控制控制 (Emphasis control for parallel neural TTS)

The semantic information conveyed by a speech signal is strongly influenced by local variations in prosody. Recent parallel neural text-to-speech (TTS) synthesis methods are able to generate speech with high fidelity while maintaining high performance. However, these systems often lack simple control over the output prosody, thus restricting the semantic information conveyable for a given text. This paper proposes a hierarchical parallel neural TTS system for prosodic emphasis control by learning a latent space that directly corresponds to a change in emphasis. Three candidate features for the latent space are compared: 1) Variance of pitch and duration within words in a sentence, 2) a wavelet based feature computed from pitch, energy, and duration and 3) a learned combination of the above features. Objective measures reveal that the proposed methods are able to achieve a wide range of emphasis modification, and subjective evaluations on the degree of emphasis and the overall quality indicate that they show promise for real-world applications.

翻译：语音信号所传递的语义信息受到当地语言变异的强烈影响。最近的平行神经文字对语音合成方法能够在保持高性能的同时产生高度忠诚的言语,但这些系统往往缺乏对输出流体的简单控制,从而限制了为某一文本传递的语义信息。本文件建议通过学习一个与重点变化直接对应的潜在空间来进行分级平行神经 TTS系统,以进行分层控制。对潜伏空间的三个候选特征进行了比较:(1) 句子内音位和持续时间的差异;(2) 基于波段的特征根据音道、能量和持续时间计算;以及(3) 以上特征的学习组合。客观措施表明,拟议方法能够实现广泛的强调修改,对强调程度和总体质量的主观评价表明,它们显示了对现实世界应用的希望。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

专知会员服务

34+阅读 · 2021年11月30日

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

威斯康辛大学《机器学习导论》2020秋季课程完结，课件、视频资源已开放

专知会员服务

16+阅读 · 2020年12月25日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日