In this work, we propose a frequency bin-wise method to estimate the single-channel speech presence probability (SPP) with multiple deep neural networks (DNNs) in the short-time Fourier transform domain. Since all frequency bins are typically considered simultaneously as input features for conventional DNN-based SPP estimators, high model complexity is inevitable. To reduce the model complexity and the requirements on the training data, we take a single frequency bin and some of its neighboring frequency bins into account to train separate gate recurrent units. In addition, the noisy speech and the a posteriori probability SPP representation are used to train our model. The experiments were performed on the Deep Noise Suppression challenge dataset. The experimental results show that the speech detection accuracy can be improved when we employ the frequency bin-wise model. Finally, we also demonstrate that our proposed method outperforms most of the state-of-the-art SPP estimation methods in terms of speech detection accuracy and model complexity.
翻译:在这项工作中,我们提出一个频率二进制方法,用以估算短时Fourier变换域中具有多个深神经网络的单通道语音存在概率(SPP) 。由于所有频率文件夹通常都被视为传统 DNN 的 SPP 测算器的输入特性,因此,高模型复杂性是不可避免的。为了降低模型复杂性和对培训数据的要求,我们用一个单一频率箱和一些相邻频率箱来计算不同的大门经常单元。此外,还使用了吵闹式演讲和后继概率 SPP 代表来培训我们的模型。实验是在深噪音抑制挑战数据集上进行的。实验结果显示,如果我们使用频率双向模型,语音检测的精确度可以提高。最后,我们还表明,我们所提议的方法在语音检测准确性和模型复杂度方面超过了最先进的SPP估计方法。