多语言词错误率估计: e-WER3 (Multilingual Word Error Rate Estimation: e-WER3) - 专知论文

会员服务 ·

0

错误率 · 语音识别 · 识别系统 · Co-training · 相关系数 ·

2023 年 4 月 2 日

Multilingual Word Error Rate Estimation: e-WER3

翻译：多语言词错误率估计: e-WER3

Shammur Absar Chowdhury,Ahmed Ali

from arxiv, Accepted in ICASSP, Multilingual WER estimation, End-to-End systems, multilingual model, automatic word error rate estimation

The success of the multilingual automatic speech recognition systems empowered many voice-driven applications. However, measuring the performance of such systems remains a major challenge, due to its dependency on manually transcribed speech data in both mono- and multilingual scenarios. In this paper, we propose a novel multilingual framework -- eWER3 -- jointly trained on acoustic and lexical representation to estimate word error rate. We demonstrate the effectiveness of eWER3 to (i) predict WER without using any internal states from the ASR and (ii) use the multilingual shared latent space to push the performance of the close-related languages. We show our proposed multilingual model outperforms the previous monolingual word error rate estimation method (eWER2) by an absolute 9\% increase in Pearson correlation coefficient (PCC), with better overall estimation between the predicted and reference WER.

翻译：成功的多语言语音识别系统赋予了许多语音应用程序无限的可能性。然而，由于在单语和多语言场景下依赖手动转录的语音数据，因此测量这些系统的性能仍然是一个主要的挑战。在本文中，我们提出了一个新颖的多语言框架——eWER3——它是在声学和词汇表示上联合训练的，用于估算词错误率。我们展示了eWER3的有效性，可以(i)在不使用ASR的任何内部状态的情况下预测WER，以及(ii)使用多语言共享的潜在空间来推动相关语言的性能。我们展示了我们提出的多语言模型比之前的单语言词错误率估计方法(eWER2)在皮尔逊相关系数(PCC)上提高了9个百分点的绝对值，并且在预测和参考WER之间的整体估计更好。

0

相关内容

错误率

指分类错误的样本数占样本总数的比例。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

90+阅读 · 2020年7月9日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知

4+阅读 · 2022年10月2日

KDD2020推荐系统论文聚焦

KDD2020推荐系统论文聚焦

机器学习与推荐算法

15+阅读 · 2020年6月28日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

平稳相依空间数据下基于经验似然的非参数统计推断

国家自然科学基金

0+阅读 · 2013年12月31日

高维非参数模型(可加模型，多指标可加模型)的直接变量选择和估计

国家自然科学基金

1+阅读 · 2013年12月31日

不完全数据下分位数回归模型的经验似然推断

国家自然科学基金

1+阅读 · 2013年12月31日

缺失数据下广义线性模型的经验似然和变量选择问题

国家自然科学基金

0+阅读 · 2012年12月31日

含指标项的变换模型的估计与经验似然分析

国家自然科学基金

0+阅读 · 2012年12月31日

缺失数据下基于经验似然的稳健推断函数

国家自然科学基金

1+阅读 · 2012年12月31日

不完全数据半参数回归模型的统计分析及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

随机矩阵特征值问题

国家自然科学基金

3+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

缺失数据下部分线性单指标模型的经验似然推断

国家自然科学基金

0+阅读 · 2009年12月31日

On the estimation of locally stationary functional time series

Arxiv

0+阅读 · 2023年5月22日

Learning to Rank Utterances for Query-Focused Meeting Summarization

Arxiv

0+阅读 · 2023年5月22日

You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example

Arxiv

0+阅读 · 2023年5月22日

ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion

Arxiv

0+阅读 · 2023年5月20日

Causes and Cures for Interference in Multilingual Translation

Arxiv

0+阅读 · 2023年5月19日

Meta-learning for heterogeneous treatment effect estimation with closed-form solvers

Arxiv

0+阅读 · 2023年5月19日

CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

Arxiv

0+阅读 · 2023年5月18日

Transformer-based Variable-rate Image Compression with Region-of-interest Control

Arxiv

0+阅读 · 2023年5月18日

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

Arxiv

0+阅读 · 2023年5月18日

Spectral Change Point Estimation for High Dimensional Time Series by Sparse Tensor Decomposition

Arxiv

0+阅读 · 2023年5月18日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

90+阅读 · 2020年7月9日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知

4+阅读 · 2022年10月2日

KDD2020推荐系统论文聚焦

KDD2020推荐系统论文聚焦

机器学习与推荐算法

15+阅读 · 2020年6月28日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

相关论文

On the estimation of locally stationary functional time series

Arxiv

0+阅读 · 2023年5月22日

Learning to Rank Utterances for Query-Focused Meeting Summarization

Arxiv

0+阅读 · 2023年5月22日

You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example

Arxiv

0+阅读 · 2023年5月22日

ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion

Arxiv

0+阅读 · 2023年5月20日

Causes and Cures for Interference in Multilingual Translation

Arxiv

0+阅读 · 2023年5月19日

Meta-learning for heterogeneous treatment effect estimation with closed-form solvers

Arxiv

0+阅读 · 2023年5月19日

CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

Arxiv

0+阅读 · 2023年5月18日

Transformer-based Variable-rate Image Compression with Region-of-interest Control

Arxiv

0+阅读 · 2023年5月18日

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

Arxiv

0+阅读 · 2023年5月18日

Spectral Change Point Estimation for High Dimensional Time Series by Sparse Tensor Decomposition

Arxiv

0+阅读 · 2023年5月18日

相关基金

平稳相依空间数据下基于经验似然的非参数统计推断

国家自然科学基金

0+阅读 · 2013年12月31日

高维非参数模型(可加模型，多指标可加模型)的直接变量选择和估计

国家自然科学基金

1+阅读 · 2013年12月31日

不完全数据下分位数回归模型的经验似然推断

国家自然科学基金

1+阅读 · 2013年12月31日

缺失数据下广义线性模型的经验似然和变量选择问题

国家自然科学基金

0+阅读 · 2012年12月31日

含指标项的变换模型的估计与经验似然分析

国家自然科学基金

0+阅读 · 2012年12月31日

缺失数据下基于经验似然的稳健推断函数

国家自然科学基金

1+阅读 · 2012年12月31日

不完全数据半参数回归模型的统计分析及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

随机矩阵特征值问题

国家自然科学基金

3+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

缺失数据下部分线性单指标模型的经验似然推断

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员