低资源蒙古语文本到语音系统的高效训练基于FullConv-TTS (Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS)

Recurrent Neural Networks (RNNs) have become the standard modeling technique for sequence data, and are used in a number of novel text-to-speech models. However, training a TTS model including RNN components has certain requirements for GPU performance and takes a long time. In contrast, studies have shown that CNN-based sequence synthesis technology can greatly reduce training time in text-to-speech models while ensuring a certain performance due to its high parallelism. We propose a new text-to-speech system based on deep convolutional neural networks that does not employ any RNN components (recurrent units). At the same time, we improve the generality and robustness of our model through a series of data augmentation methods such as Time Warping, Frequency Mask, and Time Mask. The final experimental results show that the TTS model using only the CNN component can reduce the training time compared to the classic TTS models such as Tacotron while ensuring the quality of the synthesized speech.

翻译：循环神经网络（RNNs）已经成为序列数据建模的标准技术，并在许多新的TTS模型中使用。但是，包括RNN组件的TTS模型的训练需要GPU性能的特定要求，并且需要很长时间。相反，研究表明，基于CNN的序列合成技术可以大大减少文本到语音模型的训练时间，同时由于其高并发性而保证一定的性能。我们提出了一种基于深度卷积神经网络的新的文本到语音系统，该系统不使用任何RNN组件（循环单元）。同时，我们通过一系列数据增强方法（如时间扭曲，频率屏蔽和时间屏蔽）来改善模型的普适性和鲁棒性。最终的实验结果表明，仅使用CNN组件的TTS模型可以减少训练时间，相比于Tacotron等经典TTS模型的质量也得到了保证。

相关内容

语音系统

关注 0

语音系统的应用可以分为两个发展方向：一个方向是大词汇量连续语音识别系统，主要应用于计算机的听写机，以及与电话网或者互联网相结合的语音信息查询服务系统，这些系统都是在计算机平台上实现的。另外一个重要的发展方向是小型化、便携式语音产品的应用，如无线手机上的拨号、汽车设备的语音控制、智能玩具、家电遥控等方面的应用，这些应用系统大都使用专门的第三方软件来实现，特别是近几年来迅速发展的语音信号处理专用芯片（Application Specific Integrated Circuit，ASIC）和语音识别片上系统（System on Chip，SOC）的出现。

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日