它$$ text{o}$TTS 和它$ text{o}$Wave: 音频生成所需的所有你所需要的线性存储差异方程式 (It$\hat{\text{o}}$TTS and It$\hat{\text{o}}$Wave: Linear Stochastic Differential Equation Is All You Need For Audio Generation) - 专知论文

会员服务 ·

0

易处理的 · 线性的 · 语音合成 · SimPLe · Pair ·

2021 年 8 月 9 日

It$\hat{\text{o}}$TTS and It$\hat{\text{o}}$Wave: Linear Stochastic Differential Equation Is All You Need For Audio Generation

翻译：它$$ text{o}$TTS 和它$ text{o}$Wave: 音频生成所需的所有你所需要的线性存储差异方程式

Shoule Wu,Ziqiang Shi

from arxiv, The generated audio samples are available at https://shiziqiang.github.io/ito\_audio/

In this paper, we propose to unify the two aspects of voice synthesis, namely text-to-speech (TTS) and vocoder, into one framework based on a pair of forward and reverse-time linear stochastic differential equations (SDE). The solutions of this SDE pair are two stochastic processes, one of which turns the distribution of mel spectrogram (or wave), that we want to generate, into a simple and tractable distribution. The other is the generation procedure that turns this tractable simple signal into the target mel spectrogram (or wave). The model that generates mel spectrogram is called It$\hat{\text{o}}$TTS, and the model that generates wave is called It$\hat{\text{o}}$Wave. It$\hat{\text{o}}$TTS and It$\hat{\text{o}}$Wave use the Wiener process as a driver to gradually subtract the excess signal from the noise signal to generate realistic corresponding meaningful mel spectrogram and audio respectively, under the conditional inputs of original text or mel spectrogram. The results of the experiment show that the mean opinion scores (MOS) of It$\hat{\text{o}}$TTS and It$\hat{\text{o}}$Wave can exceed the current state-of-the-art methods, and reached 3.925$\pm$0.160 and 4.35$\pm$0.115 respectively. The generated audio samples are available at https://shiziqiang.github.io/ito\_audio/. All authors contribute equally to this work.

翻译：在本文中, 我们提议将语音合成的两个方面, 即文本到语音( TTS) 和vocoder 合并成一个框架, 以一对前方和反向线性线性分解方程为基础。这个 SDE 配对的解决方案是两个随机过程, 其中之一是将我们想要生成的光谱( 或波) 的分布转换成一个简单和可移动的分布。另一个是将这个可移动的简单信号转换成目标Mel光谱( 或波) 的生成程序。生成Mel光谱的模型叫做 It$\ hat\ text{ o_ $TTS, 和生成波的模型叫做 It$\ hat text{ $_ o_ $ wave。其中之一, 我们想要生成的光谱( 或波或波 ) 将 Wiener 进程作为驱动器, 逐渐减少这个音量信号的多余信号, 以产生符合现实的对应的线性线性光谱/ 和音频。在原始文本或Mlexlium $@ sal_ lial_ lio=_ supal_ sal_ sal_ supal_ supal_ sal_ supal_ woceal_ slational_ sal_ sal_ sal_ sal_xxxxxxxal__ sal_ sal_ sal_ sal_ exxxxxxxxxxxxxxxxx

0

相关内容

易处理的

【ICML2021】双加速的快速间隔最大化

专知会员服务

12+阅读 · 2021年7月4日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

专知会员服务

11+阅读 · 2020年12月8日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Nature 一周论文导读 | 2019 年 5 月 30 日

Nature 一周论文导读 | 2019 年 5 月 30 日

科研圈

15+阅读 · 2019年6月9日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Lightweight Speech Enhancement in Unseen Noisy and Reverberant Conditions using KISS-GEV Beamforming

Arxiv

0+阅读 · 2021年10月6日

Emphasis control for parallel neural TTS

Arxiv

0+阅读 · 2021年10月6日

Minimax rate of estimation for invariant densities associated to continuous stochastic differential equations over anisotropic Holder classes

Arxiv

0+阅读 · 2021年10月6日

Robust Multi-dimensional Model Order Estimation Using LineAr Regression of Global Eigenvalues (LaRGE)

Arxiv

0+阅读 · 2021年10月6日

Numerical analysis of the Landau-Lifshitz-Gilbert equation with inertial effects

Arxiv

0+阅读 · 2021年10月5日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年4月6日

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

Arxiv

3+阅读 · 2020年3月4日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】双加速的快速间隔最大化

专知会员服务

12+阅读 · 2021年7月4日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

NeurIPS 2020最佳论文奖项出炉！GPT-3、伯克利等3篇论文摘得！

专知会员服务

11+阅读 · 2020年12月8日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的检索与结构化增强生成综述

《实现多层防御多轮交战机制的扩展型随机齐射模型》2025年最新83页

【CMU博士论文】交互驱动的人体动作估计与生成

如何避免生成式人工智能在作战中失控失效

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Nature 一周论文导读 | 2019 年 5 月 30 日

Nature 一周论文导读 | 2019 年 5 月 30 日

科研圈

15+阅读 · 2019年6月9日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Lightweight Speech Enhancement in Unseen Noisy and Reverberant Conditions using KISS-GEV Beamforming

Arxiv

0+阅读 · 2021年10月6日

Emphasis control for parallel neural TTS

Arxiv

0+阅读 · 2021年10月6日

Minimax rate of estimation for invariant densities associated to continuous stochastic differential equations over anisotropic Holder classes

Arxiv

0+阅读 · 2021年10月6日

Robust Multi-dimensional Model Order Estimation Using LineAr Regression of Global Eigenvalues (LaRGE)

Arxiv

0+阅读 · 2021年10月6日

Numerical analysis of the Landau-Lifshitz-Gilbert equation with inertial effects

Arxiv

0+阅读 · 2021年10月5日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年4月6日

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

Arxiv

3+阅读 · 2020年3月4日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

微信扫码咨询专知VIP会员