开放ASR排行榜：迈向可复现且透明的多语言语音识别评估 (Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual Speech Recognition Evaluation) - 专知论文

会员服务 ·

0

语音识别 · 识别 · 准确率 · 解码 · 数据集 ·

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual Speech Recognition Evaluation

翻译：开放ASR排行榜：迈向可复现且透明的多语言语音识别评估

Vaibhav Srivastav,Steven Zheng,Eric Bezzam,Eustache Le Bihan,Adel Moumen,Sanchit Gandhi

from arxiv, Leaderboard: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard ; Code: https://github.com/huggingface/open_asr_leaderboard

Despite rapid progress, ASR evaluation remains saturated with short-form English, and efficiency is rarely reported. We present the Open ASR Leaderboard, a fully reproducible benchmark and interactive leaderboard comparing 60+ open-source and proprietary systems across 11 datasets, including a dedicated multilingual track. We standardize text normalization and report both word error rate (WER) and inverse real-time factor (RTFx), enabling fair accuracy-efficiency comparisons. For English transcription, Conformer encoders paired with LLM decoders achieve the best average WER but are slower, while CTC and TDT decoders deliver much better RTFx, making them attractive for long-form and offline use. Whisper-derived encoders fine-tuned for English improve accuracy but often trade off multilingual coverage. All code and dataset loaders are open-sourced to support transparent, extensible evaluation.

翻译：尽管进展迅速，ASR评估仍主要集中于短时英语，且效率指标鲜有报告。我们提出了开放ASR排行榜，这是一个完全可复现的基准测试与交互式排行榜，比较了超过60个开源与专有系统在11个数据集上的表现，包括专门的多语言赛道。我们标准化了文本归一化流程，并同时报告词错误率（WER）与逆实时因子（RTFx），以实现公平的准确率-效率比较。在英语转录任务中，Conformer编码器与LLM解码器组合实现了最佳平均WER，但速度较慢；而CTC与TDT解码器则提供了显著更优的RTFx，使其在长时音频与离线场景中更具吸引力。针对英语优化的Whisper衍生编码器提升了准确率，但往往牺牲了多语言覆盖能力。所有代码与数据集加载器均已开源，以支持透明、可扩展的评估。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

DARPA少标签学习项目成果《学会用更少的标签学习和适应》2023最新90页技术报告

DARPA少标签学习项目成果《学会用更少的标签学习和适应》2023最新90页技术报告

专知会员服务

38+阅读 · 2023年12月13日

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【CVPR 2019 | tutorial】OpenCV 4.x和更多用于CV研发的新工具：OpenCV 4.x and more new tools for CV R&D

【CVPR 2019 | tutorial】OpenCV 4.x和更多用于CV研发的新工具：OpenCV 4.x and more new tools for CV R&D

专知会员服务

12+阅读 · 2019年11月28日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知

19+阅读 · 2022年8月31日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

预知未来——Gluon 时间序列工具包（GluonTS）

预知未来——Gluon 时间序列工具包（GluonTS）

ApacheMXNet

24+阅读 · 2019年6月25日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

组合测试用例优先排序算法及选择策略研究

国家自然科学基金

8+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于组合Hodge理论的图像视频质量评价方法

国家自然科学基金

0+阅读 · 2014年12月31日

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Arxiv

0+阅读 · 12月11日

VoiceAgentEval: A Dual-Dimensional Benchmark for Expert-Level Intelligent Voice-Agent Evaluation of Xbench's Professional-Aligned Series

Arxiv

0+阅读 · 11月14日

MMTEB: Massive Multilingual Text Embedding Benchmark

Arxiv

0+阅读 · 11月13日

CorPipe at CRAC 2025: Evaluating Multilingual Encoders for Multilingual Coreference Resolution

Arxiv

0+阅读 · 11月6日

Surfacing Subtle Stereotypes: A Multilingual, Debate-Oriented Evaluation of Modern LLMs

Arxiv

0+阅读 · 11月3日

VIP会员

文章信息

相关主题

相关VIP内容

DARPA少标签学习项目成果《学会用更少的标签学习和适应》2023最新90页技术报告

DARPA少标签学习项目成果《学会用更少的标签学习和适应》2023最新90页技术报告

专知会员服务

38+阅读 · 2023年12月13日

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【CVPR 2019 | tutorial】OpenCV 4.x和更多用于CV研发的新工具：OpenCV 4.x and more new tools for CV R&D

【CVPR 2019 | tutorial】OpenCV 4.x和更多用于CV研发的新工具：OpenCV 4.x and more new tools for CV R&D

专知会员服务

12+阅读 · 2019年11月28日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知

19+阅读 · 2022年8月31日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

预知未来——Gluon 时间序列工具包（GluonTS）

预知未来——Gluon 时间序列工具包（GluonTS）

ApacheMXNet

24+阅读 · 2019年6月25日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

相关论文

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Arxiv

0+阅读 · 12月11日

VoiceAgentEval: A Dual-Dimensional Benchmark for Expert-Level Intelligent Voice-Agent Evaluation of Xbench's Professional-Aligned Series

Arxiv

0+阅读 · 11月14日

MMTEB: Massive Multilingual Text Embedding Benchmark

Arxiv

0+阅读 · 11月13日

CorPipe at CRAC 2025: Evaluating Multilingual Encoders for Multilingual Coreference Resolution

Arxiv

0+阅读 · 11月6日

Surfacing Subtle Stereotypes: A Multilingual, Debate-Oriented Evaluation of Modern LLMs

Arxiv

0+阅读 · 11月3日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

组合测试用例优先排序算法及选择策略研究

国家自然科学基金

8+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于组合Hodge理论的图像视频质量评价方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员