Speech enhancement tasks have seen significant improvements with the advance of deep learning technology, but with the cost of increased computational complexity. In this study, we propose an adaptive boosting approach to learning locality sensitive hash codes, which represent audio spectra efficiently. We use the learned hash codes for single-channel speech denoising tasks as an alternative to a complex machine learning model, particularly to address the resource-constrained environments. Our adaptive boosting algorithm learns simple logistic regressors as the weak learners. Once trained, their binary classification results transform each spectrum of test noisy speech into a bit string. Simple bitwise operations calculate Hamming distance to find the K-nearest matching frames in the dictionary of training noisy speech spectra, whose associated ideal binary masks are averaged to estimate the denoising mask for that test mixture. Our proposed learning algorithm differs from AdaBoost in the sense that the projections are trained to minimize the distances between the self-similarity matrix of the hash codes and that of the original spectra, rather than the misclassification rate. We evaluate our discriminative hash codes on the TIMIT corpus with various noise types, and show comparative performance to deep learning methods in terms of denoising performance and complexity.
翻译:随着深层次学习技术的进步,增强语音的任务有了显著的改进,但计算复杂性增加的成本。在本研究中,我们提出一种适应性增强方法,以学习地方敏感散列代码,这能高效地代表声光光。我们使用单声道语音分解任务所学的散列代码作为复杂的机器学习模式的替代物,特别是处理资源紧张的环境。我们的适应性促进算法学习简单的后勤递减器,作为弱学习者学习。一旦经过培训,他们的二进制分类结果将每个测试噪音语音频谱转换为微小的字符串。简单比对操作算出宽度距离,以在培训吵闹话谱词谱词典中找到K-近距离匹配框架,其相关的理想二进制面罩平均用于估计该测试混合物的分解面罩。我们提议的学习算法不同于AdaBoost,因为预测是为了尽量减少散语码自相近的矩阵和原始光谱之间的距离,而不是错误的分级率率。我们用各种噪音和深度学习方法对TIMITC宏作了分析。