Noise suppression systems generally produce output speech with compromised quality. We propose to utilize the high quality speech generation capability of neural vocoders for noise suppression. We use a neural network to predict clean mel-spectrogram features from noisy speech and then compare two neural vocoders, WaveNet and WaveGlow, for synthesizing clean speech from the predicted mel spectrogram. Both WaveNet and WaveGlow achieve better subjective and objective quality scores than the source separation model Chimera++. Further, WaveNet and WaveGlow also achieve significantly better subjective quality ratings than the oracle Wiener mask. Moreover, we observe that between WaveNet and WaveGlow, WaveNet achieves the best subjective quality scores, although at the cost of much slower waveform generation.
翻译:噪音抑制系统通常产生质量受损的输出言辞。 我们提议利用神经电动器高质量语音生成能力来抑制噪音。 我们使用神经网络来预测来自吵闹言辞的清洁中位谱特征,然后将两个神经电动器WaveNet和WaveGlow作比较,以便从预测的光谱中合成清洁言词。WaveNet和WaveGlow都比源分离模型Chimera++获得更好的主观和客观质量分数。 此外,WaveNet和WaveGlow也比Oracle Wiener 面具获得更好的主观质量评分。 此外,我们观察到WaveNet和WaveGlow之间,WaveNet取得了最佳的主观质量评分,尽管其代价是低得多的波形生成。