大型自动语音识别单式传输器调查 (An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition) - 专知论文

会员服务 ·

0

自动语音识别 · 语音识别 · 损失函数（机器学习） · Performer · 泛函 ·

2022 年 4 月 19 日

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

翻译：大型自动语音识别单式传输器调查

Niko Moritz,Frank Seide,Duc Le,Jay Mahadeokar,Christian Fuegen

from arxiv, Submitted to Interspeech 2022

The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are the RNN-Transducer (RNN-T) and the connectionist temporal classification (CTC) objectives. Both perform an alignment-free training by marginalizing over all possible alignments, but use different transition rules. Between these two loss types we can classify the monotonic RNN-T (MonoRNN-T) and the recently proposed CTC-like Transducer (CTC-T), which both can be realized using the graph temporal classification-transducer (GTC-T) loss function. Monotonic transducers have a few advantages. First, RNN-T can suffer from runaway hallucination, where a model keeps emitting non-blank symbols without advancing in time, often in an infinite loop. Secondly, monotonic transducers consume exactly one model score per time step and are therefore more compatible and unifiable with traditional FST-based hybrid ASR decoders. However, the MonoRNN-T so far has been found to have worse accuracy than RNN-T. It does not have to be that way, though: By regularizing the training - via joint LAS training or parameter initialization from RNN-T - both MonoRNN-T and CTC-T perform as well - or better - than RNN-T. This is demonstrated for LibriSpeech and for a large-scale in-house data set.

翻译：流出端到端自动语音识别的两个最受欢迎的损失函数是 RNN- Transporter (RNN-Tradinger-Tradinger-T) 和连接器时间分类(CTC) 的目标。两者都通过在所有可能的对齐上进行边化, 但使用不同的过渡规则来进行不协调的培训。在这两种损失类型中,我们可以将单调 RNN- T (MonORNNN-T) 和最近提议的类似CT- Transporter (CT-T) 分类(CT-T) 进行分类, 两者都可以使用图形时间分类- Transporter (GTC-T) 损失函数来实现。单调转换器转换器具有少数的优势。首先, RNNNT 可能会遭受失控的幻觉, 模型在不延时不推进的情况下, 使用不同的过渡规则。第二, 单调式传感器每一步就消耗一个完全的模型,因此与传统的基于FST的混合 SR- 解算器的混合解算器。然而, MON- NRCNNT 迄今比 R- 的大规模培训或MNF- 都比初始化为正常化, 。它不是通过常规化, 运行中显示一个更好的数据。

0

相关内容

自动语音识别

自动语音识别

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

目标跟踪中的时空上下文建模方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于天然产物Drimenal的新型杀菌剂分子设计、合成及构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型HER2抗体TPC对HER2阳性Trastuzumab耐受型乳腺癌的杀伤作用及分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Dicer在先天性巨结肠发病中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

癌/睾丸抗原HCA587对转录因子NF-κB的调节作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

hMSCs定向汗腺细胞分化中TRAF6信号复合物活化不同NF-κB通路的机制

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

AttX: Attentive Cross-Connections for Fusion of Wearable Signals in Emotion Recognition

Arxiv

0+阅读 · 2022年6月9日

Abstraction not Memory: BERT and the English Article System

Arxiv

0+阅读 · 2022年6月8日

Words are all you need? Capturing human sensory similarity with textual descriptors

Arxiv

0+阅读 · 2022年6月8日

An iterated block particle filter for inference on coupled dynamic systems with shared and unit-specific parameters

Arxiv

0+阅读 · 2022年6月8日

Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation

Arxiv

0+阅读 · 2022年6月7日

MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

Arxiv

0+阅读 · 2022年6月7日

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Arxiv

11+阅读 · 2021年4月29日

Learning to Propagate for Graph Meta-Learning

Arxiv

14+阅读 · 2019年9月11日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

21+阅读 · 2018年12月25日

VIP会员

文章信息

相关主题

自动语音识别

损失函数（机器学习）

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】移动计算摄影的神经场表示

大语言模型遇见法律人工智能：综述

【ICCV2025】InfGen：一种分辨率无关的可扩展图像合成范式

美军用无人地面战车发展：现代战争中超越弹药的多元应用

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

AttX: Attentive Cross-Connections for Fusion of Wearable Signals in Emotion Recognition

Arxiv

0+阅读 · 2022年6月9日

Abstraction not Memory: BERT and the English Article System

Arxiv

0+阅读 · 2022年6月8日

Words are all you need? Capturing human sensory similarity with textual descriptors

Arxiv

0+阅读 · 2022年6月8日

An iterated block particle filter for inference on coupled dynamic systems with shared and unit-specific parameters

Arxiv

0+阅读 · 2022年6月8日

Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation

Arxiv

0+阅读 · 2022年6月7日

MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

Arxiv

0+阅读 · 2022年6月7日

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Arxiv

11+阅读 · 2021年4月29日

Learning to Propagate for Graph Meta-Learning

Arxiv

14+阅读 · 2019年9月11日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

21+阅读 · 2018年12月25日

相关基金

目标跟踪中的时空上下文建模方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于天然产物Drimenal的新型杀菌剂分子设计、合成及构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型HER2抗体TPC对HER2阳性Trastuzumab耐受型乳腺癌的杀伤作用及分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Dicer在先天性巨结肠发病中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

癌/睾丸抗原HCA587对转录因子NF-κB的调节作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

hMSCs定向汗腺细胞分化中TRAF6信号复合物活化不同NF-κB通路的机制

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员