VARA-TTS:以甚深VAE为基础、具有残余关注的非自动递退性文本到语音合成 (VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention)

This paper proposes VARA-TTS, a non-autoregressive (non-AR) text-to-speech (TTS) model using a very deep Variational Autoencoder (VDVAE) with Residual Attention mechanism, which refines the textual-to-acoustic alignment layer-wisely. Hierarchical latent variables with different temporal resolutions from the VDVAE are used as queries for residual attention module. By leveraging the coarse global alignment from previous attention layer as an extra input, the following attention layer can produce a refined version of alignment. This amortizes the burden of learning the textual-to-acoustic alignment among multiple attention layers and outperforms the use of only a single attention layer in robustness. An utterance-level speaking speed factor is computed by a jointly-trained speaking speed predictor, which takes the mean-pooled latent variables of the coarsest layer as input, to determine number of acoustic frames at inference. Experimental results show that VARA-TTS achieves slightly inferior speech quality to an AR counterpart Tacotron 2 but an order-of-magnitude speed-up at inference; and outperforms an analogous non-AR model, BVAE-TTS, in terms of speech quality.

翻译：本文建议 VARA- TTS, 这是一种非自动( 非 AR) 文本到语音模式, 使用一种非常深的动态自动读数器( VDVAE ), 并配有残余注意机制, 以完善文本到声学的对齐层。使用VDVAE 中不同时间分辨率的等级潜在变量作为剩余注意模块的查询。通过将先前关注层中粗糙的全球对齐作为额外输入, 以下的注意层可以产生一个精细的对齐版本。这在多个关注层中重新组合学习文本到声调的对齐过程, 并且只优于单个注意层的稳健度。发音速度系数由经过联合训练的语音速度预测器计算, 将粗皮层中的平均组合潜在潜在变量作为投入, 以确定推断时的声调框架数量。实验结果表明, VAR- TTS 达到稍低的语音质量, 在对应对应的塔科托罗2 级语言标准中, 而不是质量。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

注意力机制综述

专知会员服务

83+阅读 · 2021年1月26日