In audio signal processing, learnable front-ends have shown strong performance across diverse tasks by optimizing task-specific representation. However, their parameters remain fixed once trained, lacking flexibility during inference and limiting robustness under dynamic complex acoustic environments. In this paper, we introduce a novel adaptive paradigm for audio front-ends that replaces static parameterization with a closed-loop neural controller. Specifically, we simplify the learnable front-end LEAF architecture and integrate a neural controller for adaptive representation via dynamically tuning Per-Channel Energy Normalization. The neural controller leverages both the current and the buffered past subband energies to enable input-dependent adaptation during inference. Experimental results on multiple audio classification tasks demonstrate that the proposed adaptive front-end consistently outperforms prior fixed and learnable front-ends under both clean and complex acoustic conditions. These results highlight neural adaptability as a promising direction for the next generation of audio front-ends.
翻译:在音频信号处理中,可学习前端通过优化任务特定表征,已在多种任务中展现出优异性能。然而,其参数一旦训练完成即保持固定,在推理阶段缺乏灵活性,限制了动态复杂声学环境下的鲁棒性。本文提出一种新颖的自适应音频前端范式,通过闭环神经控制器替代静态参数化。具体而言,我们简化了可学习前端LEAF架构,并集成一个神经控制器,通过动态调整逐通道能量归一化实现自适应表征。该神经控制器利用当前及缓冲的过往子带能量,在推理过程中实现输入依赖的自适应调整。在多种音频分类任务上的实验结果表明,所提出的自适应前端在纯净及复杂声学条件下均持续优于先前的固定式与可学习前端。这些结果凸显了神经自适应能力作为下一代音频前端的重要发展方向。