Crick: 基于矢量量化变量自动编码器的非平行语音转换的开放源码软件 (crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder)

In this paper, we present an open-source software for developing a nonparallel voice conversion (VC) system named crank. Although we have released an open-source VC software based on the Gaussian mixture model named sprocket in the last VC Challenge, it is not straightforward to apply any speech corpus because it is necessary to prepare parallel utterances of source and target speakers to model a statistical conversion function. To address this issue, in this study, we developed a new open-source VC software that enables users to model the conversion function by using only a nonparallel speech corpus. For implementing the VC software, we used a vector-quantized variational autoencoder (VQVAE). To rapidly examine the effectiveness of recent technologies developed in this research field, crank also supports several representative works for autoencoder-based VC methods such as the use of hierarchical architectures, cyclic architectures, generative adversarial networks, speaker adversarial training, and neural vocoders. Moreover, it is possible to automatically estimate objective measures such as mel-cepstrum distortion and pseudo mean opinion score based on MOSNet. In this paper, we describe representative functions developed in crank and make brief comparisons by objective evaluations.

翻译：在本文中,我们展示了一个开发非平行语音转换(VC)系统的开源软件,名为曲柄。虽然我们已经发布了一个基于上一个VC挑战中名为螺旋石的Gaussian混合物模型的开放源码VC软件,但应用任何语音材料并不简单,因为有必要编制平行的源和目标演讲者言论,以模拟统计转换功能。为了解决这一问题,我们在本研究报告中开发了一个新的开源码软件,使用户能够仅使用一个非平行的语音资料库来模拟转换功能。为了实施VC软件,我们使用了一种矢量放大变异自动coder(VQVAE)软件。为了迅速审查最近在这个研究领域开发的技术的有效性,Central还支持了若干基于自动电解码的VC方法的代表性工作,例如使用等级结构、环球结构、配比网络、扬声器对抗网络、保音器训练以及神经电动电动电动调。此外,我们还可以自动估计客观措施,例如Mel-cstrum 扭曲和虚拟目标比较,我们用MOSNet 进行简要的测算。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【干货书】管理统计和数据科学原理，678页pdf

专知会员服务

185+阅读 · 2020年7月29日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日