Deep learning-based speech enhancement methods have significantly improved speech quality and intelligibility. Convolutional neural networks (CNNs) have been proven to be essential components of many high-performance models. In this paper, we introduce adaptive convolution, an efficient and versatile convolutional module that enhances the model's capability to adaptively represent speech signals. Adaptive convolution performs frame-wise causal dynamic convolution, generating time-varying kernels for each frame by assembling multiple parallel candidate kernels. A lightweight attention mechanism is proposed for adaptive convolution, leveraging both current and historical information to assign adaptive weights to each candidate kernel. This enables the convolution operation to adapt to frame-level speech spectral features, leading to more efficient extraction and reconstruction. We integrate adaptive convolution into various CNN-based models, highlighting its generalizability. Experimental results demonstrate that adaptive convolution significantly improves the performance with negligible increases in computational complexity, especially for lightweight models. Moreover, we present an intuitive analysis revealing a strong correlation between kernel selection and signal characteristics. Furthermore, we propose the adaptive convolutional recurrent network (AdaptCRN), an ultra-lightweight model that incorporates adaptive convolution and an efficient encoder-decoder design, achieving superior performance compared to models with similar or even higher computational costs.
翻译:基于深度学习的语音增强方法显著提升了语音质量与可懂度。卷积神经网络(CNN)已被证明是许多高性能模型的核心组成部分。本文提出自适应卷积,一种高效且通用的卷积模块,可增强模型自适应表征语音信号的能力。自适应卷积执行帧级因果动态卷积,通过组合多个并行候选核为每帧生成时变卷积核。我们为自适应卷积设计了一种轻量级注意力机制,利用当前及历史信息为各候选核分配自适应权重,使卷积操作能适应帧级语音谱特征,从而实现更高效的提取与重建。我们将自适应卷积集成至多种基于CNN的模型中,突显其泛化能力。实验结果表明,自适应卷积在计算复杂度几乎不增加的情况下显著提升性能,尤其对轻量级模型效果明显。此外,我们通过直观分析揭示了卷积核选择与信号特征间的强相关性。进一步地,我们提出自适应卷积循环网络(AdaptCRN),该超轻量模型结合自适应卷积与高效编码器-解码器设计,在计算成本相近甚至更高的模型中实现了更优性能。