Multilingual automatic speech recognition (ASR) remains a challenging task, especially when balancing performance across high- and low-resource languages. Recent advances in sequence modeling suggest that architectures beyond Transformers may offer better scalability and efficiency. In this work, we introduce MLMA (Multilingual Language Modeling with Mamba for ASR), a new approach that leverages the Mamba architecture -- an efficient state-space model optimized for long-context sequence processing -- for multilingual ASR. Using Mamba, MLMA implicitly incorporates language-aware conditioning and shared representations to support robust recognition across diverse languages. Experiments on standard multilingual benchmarks show that MLMA achieves competitive performance compared to Transformer-based architectures. These results highlight Mamba's potential as a strong backbone for scalable, efficient, and accurate multilingual speech recognition.
翻译:多语言自动语音识别(ASR)仍然是一项具有挑战性的任务,尤其是在平衡高资源与低资源语言性能方面。序列建模的最新进展表明,超越Transformer的架构可能提供更好的可扩展性和效率。本文提出MLMA(基于Mamba的多语言语音识别建模方法),该方法利用Mamba架构——一种针对长上下文序列处理优化的高效状态空间模型——来实现多语言ASR。MLMA通过Mamba架构隐式整合语言感知条件机制与共享表征,从而支持跨多种语言的鲁棒识别。在标准多语言基准测试上的实验表明,MLMA相比基于Transformer的架构取得了具有竞争力的性能。这些结果凸显了Mamba作为可扩展、高效且准确的多语言语音识别骨干网络的潜力。