具有监督分类的加强语音的向导变动自动编码器 (Guided Variational Autoencoder for Speech Enhancement With a Supervised Classifier)

Recently, variational autoencoders have been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. However, variational autoencoders are trained on clean speech only, which results in a limited ability of extracting the speech signal from noisy speech compared to supervised approaches. In this paper, we propose to guide the variational autoencoder with a supervised classifier separately trained on noisy speech. The estimated label is a high-level categorical variable describing the speech signal (e.g. speech activity) allowing for a more informed latent distribution compared to the standard variational autoencoder. We evaluate our method with different types of labels on real recordings of different noisy environments. Provided that the label better informs the latent distribution and that the classifier achieves good performance, the proposed approach outperforms the standard variational autoencoder and a conventional neural network-based supervised approach.

翻译：最近,变式自动电解码器被成功地用于学习对语音信号的概率前程,然后用来进行语音增强;然而,变式自动电解码器仅接受清洁言语培训,因此,与监督方法相比,从吵闹的语音中提取语音信号的能力有限。在本文中,我们提议用监督的分类器指导变式自动电解码器,对噪音言语进行单独培训。估计标签是一个高层次的绝对变量,描述语音信号(例如语音活动),允许与标准变异自动电解码器相比,更知情的潜在分布。我们用不同噪音环境真实录音的不同类型标签来评估我们的方法。只要标签更好地通报潜在分布情况,而且分类器取得良好业绩,拟议方法就超过标准变式自动电解码器和常规神经网络监督方法。

相关内容

自编码器

关注 0

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。