为下10亿用户建设ASR系统 (Towards Building ASR Systems for the Next Billion Users) - 专知论文

会员服务 ·

0

语音识别 · MoDELS · Next · 注意力机制 · state-of-the-art ·

2021 年 11 月 12 日

Towards Building ASR Systems for the Next Billion Users

翻译：为下10亿用户建设ASR系统

Tahir Javed,Sumanth Doddapaneni,Abhigyan Raman,Kaushal Santosh Bhogale,Gowtham Ramesh,Anoop Kunchukuttan,Pratyush Kumar,Mitesh M. Khapra

Recent methods in speech and language technology pretrain very LARGE models which are fine-tuned for specific tasks. However, the benefits of such LARGE models are often limited to a few resource rich languages of the world. In this work, we make multiple contributions towards building ASR systems for low resource languages from the Indian subcontinent. First, we curate 17,000 hours of raw speech data for 40 Indian languages from a wide variety of domains including education, news, technology, and finance. Second, using this raw speech data we pretrain several variants of wav2vec style models for 40 Indian languages. Third, we analyze the pretrained models to find key features: codebook vectors of similar sounding phonemes are shared across languages, representations across layers are discriminative of the language family, and attention heads often pay attention within small local windows. Fourth, we fine-tune this model for downstream ASR for 9 languages and obtain state-of-the-art results on 3 public datasets, including on very low-resource languages such as Sinhala and Nepali. Our work establishes that multilingual pretraining is an effective strategy for building ASR systems for the linguistically diverse speakers of the Indian subcontinent.

翻译：语言和语言技术前程的最近方法为具体任务作了微调,但是,这种LARGE模式的好处往往局限于世界上少数资源丰富的语言。在这项工作中,我们为印度次大陆低资源语言建立ASR系统作出了多种贡献。首先,我们从教育、新闻、技术和金融等广泛领域为40种印度语言汇编了17 000小时原始语言数据。第二,利用这些原始语言数据,我们预先为40种印度语言的 wav2vec风格模式的几种变异版本作了准备。第三,我们分析了预先培训的模式,以找到关键特征:类似声音电话的代码簿矢量在各种语言之间共享,跨层次的表达方式是语言大家庭的区别,注意力在小地方窗口中往往引起注意。第四,我们为下游语言的ASR模型做了9种语言的微调,并在3个公共数据集(包括Sinhala和Nepalii等非常低资源的语言)上获得了最新的结果。我们的工作确定,多语言的预演练是建立印度语言多样性的ASR系统的有效战略。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

【AAAI Tutorials 2019】为大数据平台构建深度学习应用程序（Building Deep Learning Applications for Big Data Platforms）

【AAAI Tutorials 2019】为大数据平台构建深度学习应用程序（Building Deep Learning Applications for Big Data Platforms）

专知会员服务

10+阅读 · 2019年11月18日

《云计算发展白皮书（2019年）》，55页PDF，中国信息通信研究院编

《云计算发展白皮书（2019年）》，55页PDF，中国信息通信研究院编

专知会员服务

39+阅读 · 2019年11月7日

【清华大学-微软研究院】构建智能开放域对话系统的挑战综述论文，31页pdf，Challenges in Building Intelligent Open-domain Dialog Systems

【清华大学-微软研究院】构建智能开放域对话系统的挑战综述论文，31页pdf，Challenges in Building Intelligent Open-domain Dialog Systems

专知会员服务

28+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models

Arxiv

0+阅读 · 2022年1月14日

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Arxiv

0+阅读 · 2022年1月13日

CLSRIL-23: Cross Lingual Speech Representations for Indic Languages

Arxiv

0+阅读 · 2022年1月13日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

A Survey on Neural Recommendation: From Collaborative Filtering to Content and Context Enriched Recommendation

Arxiv

25+阅读 · 2021年4月27日

Collaborative Self-Attention for Recommender Systems

Arxiv

3+阅读 · 2020年2月12日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Recommendation Systems for Tourism Based on Social Networks: A Survey

Recommendation Systems for Tourism Based on Social Networks: A Survey

Arxiv

3+阅读 · 2019年3月28日

A Survey on Dialogue Systems: Recent Advances and New Frontiers

Arxiv

11+阅读 · 2018年1月11日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

注意力机制

state-of-the-art

相关VIP内容

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

【AAAI Tutorials 2019】为大数据平台构建深度学习应用程序（Building Deep Learning Applications for Big Data Platforms）

【AAAI Tutorials 2019】为大数据平台构建深度学习应用程序（Building Deep Learning Applications for Big Data Platforms）

专知会员服务

10+阅读 · 2019年11月18日

《云计算发展白皮书（2019年）》，55页PDF，中国信息通信研究院编

《云计算发展白皮书（2019年）》，55页PDF，中国信息通信研究院编

专知会员服务

39+阅读 · 2019年11月7日

【清华大学-微软研究院】构建智能开放域对话系统的挑战综述论文，31页pdf，Challenges in Building Intelligent Open-domain Dialog Systems

【清华大学-微软研究院】构建智能开放域对话系统的挑战综述论文，31页pdf，Challenges in Building Intelligent Open-domain Dialog Systems

专知会员服务

28+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

相关论文

AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models

Arxiv

0+阅读 · 2022年1月14日

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Arxiv

0+阅读 · 2022年1月13日

CLSRIL-23: Cross Lingual Speech Representations for Indic Languages

Arxiv

0+阅读 · 2022年1月13日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

A Survey on Neural Recommendation: From Collaborative Filtering to Content and Context Enriched Recommendation

Arxiv

25+阅读 · 2021年4月27日

Collaborative Self-Attention for Recommender Systems

Arxiv

3+阅读 · 2020年2月12日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Recommendation Systems for Tourism Based on Social Networks: A Survey

Recommendation Systems for Tourism Based on Social Networks: A Survey

Arxiv

3+阅读 · 2019年3月28日

A Survey on Dialogue Systems: Recent Advances and New Frontiers

Arxiv

11+阅读 · 2018年1月11日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

微信扫码咨询专知VIP会员