Several speech processing systems have demonstrated considerable performance improvements when deep complex neural networks (DCNN) are coupled with self-attention (SA) networks. However, the majority of DCNN-based studies on speech dereverberation that employ self-attention do not explicitly account for the inter-dependencies between real and imaginary features when computing attention. In this study, we propose a complex-valued T-F attention (TFA) module that models spectral and temporal dependencies by computing two-dimensional attention maps across time and frequency dimensions. We validate the effectiveness of our proposed complex-valued TFA module with the deep complex convolutional recurrent network (DCCRN) using the REVERB challenge corpus. Experimental findings indicate that integrating our complex-TFA module with DCCRN improves overall speech quality and performance of back-end speech applications, such as automatic speech recognition, compared to earlier approaches for self-attention.
翻译:当深层复杂的神经网络(DCNN)与自我关注网络(SA)相结合时,若干语音处理系统表现出了相当大的性能改进,然而,大多数基于DCNN的关于使用自我关注的言语偏差的研究并未明确说明在计算注意力时真实和想象特征之间的相互依存关系。在本研究中,我们提出一个具有复杂价值的T-F注意模块,通过计算跨时间和频度的双维关注图来模拟光谱和时间依赖性。我们用REWLB挑战文集来验证我们拟议的复杂具有价值的TFA模块与深复杂的共生经常网络(DCCRN)的有效性。实验结果显示,将我们复杂的TFA模块与DCCRN相结合,可以提高语言的总体质量和后端语音应用的性能,例如自动语音识别,与较早的自我保护方法相比,可以提高语言识别等。