提高多链端对端ASR强度的双级增量和适应性四氯化碳聚合 (Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR) - 专知论文

会员服务 ·

0

语音识别 · 稳健性 · 流 · 端到端 · Performer ·

2021 年 2 月 5 日

Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR

翻译：提高多链端对端ASR强度的双级增量和适应性四氯化碳聚合

Ruizhi Li,Gregory Sell,Hynek Hermansky

from arxiv, Accepted at IEEE SLT 2021

Performance degradation of an Automatic Speech Recognition (ASR) system is commonly observed when the test acoustic condition is different from training. Hence, it is essential to make ASR systems robust against various environmental distortions, such as background noises and reverberations. In a multi-stream paradigm, improving robustness takes account of handling a variety of unseen single-stream conditions and inter-stream dynamics. Previously, a practical two-stage training strategy was proposed within multi-stream end-to-end ASR, where Stage-2 formulates the multi-stream model with features from Stage-1 Universal Feature Extractor (UFE). In this paper, as an extension, we introduce a two-stage augmentation scheme focusing on mismatch scenarios: Stage-1 Augmentation aims to address single-stream input varieties with data augmentation techniques; Stage-2 Time Masking applies temporal masks on UFE features of randomly selected streams to simulate diverse stream combinations. During inference, we also present adaptive Connectionist Temporal Classification (CTC) fusion with the help of hierarchical attention mechanisms. Experiments have been conducted on two datasets, DIRHA and AMI, as a multi-stream scenario. Compared with the previous training strategy, substantial improvements are reported with relative word error rate reductions of 29.7-59.3% across several unseen stream combinations.

翻译：在测试声学条件不同于培训时,通常会观察到自动语音识别系统的性能退化,因此,必须使自动语音识别系统在各种环境扭曲(如背景噪音和回响)方面强大起来,防止背景噪音和反响等各种环境扭曲。在多流范式中,提高稳健性考虑到处理各种看不见的单流条件和流间动态。以前,在多流终端至终端ASR内提出了一个实用的两阶段培训战略,第二阶段以第1阶段通用地物提取器(UFE)的特征制定多流模式。在本文件中,作为扩展,我们推出了一个以不匹配情景为重点的两阶段增强计划:第1阶段强化计划旨在用数据增强技术处理单流输入品种;第2阶段对随机选定的流流的UFE特性应用时间面罩模拟不同的流组合。在推断中,我们还提出了适应性连接性温度分类(CTC)与等级关注机制的帮助。在两个数据集(DIRHA和AMI)上进行了实验,作为扩展,重点是:第1阶段扩大计划旨在用数据放大技术处理单流的单流品种;第2至5级组合。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

【AAAI2021】记忆门控循环网络

【AAAI2021】记忆门控循环网络

专知会员服务

50+阅读 · 2020年12月28日

【德勤】数字化健康白皮书

专知会员服务

48+阅读 · 2020年12月4日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

CVPR2019| 05-16更新10篇论文及代码合集（含一篇oral，全景分割/文本检测/目标检测等）

CVPR2019| 05-16更新10篇论文及代码合集（含一篇oral，全景分割/文本检测/目标检测等）

极市平台

14+阅读 · 2019年5月16日

弱监督语义分割最新方法资源列表

弱监督语义分割最新方法资源列表

专知

9+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

隔离型双向DC/DC变换器：拓扑结构与控制方法

隔离型双向DC/DC变换器：拓扑结构与控制方法

NE电气

4+阅读 · 2018年4月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

极市平台

5+阅读 · 2017年6月15日

Improving robustness against common corruptions with frequency biased models

Arxiv

0+阅读 · 2021年3月30日

Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

Arxiv

0+阅读 · 2021年3月29日

Teacher-Student Training for Robust Tacotron-based TTS

Teacher-Student Training for Robust Tacotron-based TTS

Arxiv

5+阅读 · 2019年11月7日

Fast AutoAugment

Fast AutoAugment

Arxiv

5+阅读 · 2019年5月1日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

An End-to-End Baseline for Video Captioning

Arxiv

6+阅读 · 2019年4月4日

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Arxiv

9+阅读 · 2018年9月17日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

A Robust Real-Time Automatic License Plate Recognition based on the YOLO Detector

Arxiv

13+阅读 · 2018年3月1日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

VIP会员

文章信息

相关主题

相关VIP内容

【AAAI2021】记忆门控循环网络

【AAAI2021】记忆门控循环网络

专知会员服务

50+阅读 · 2020年12月28日

【德勤】数字化健康白皮书

专知会员服务

48+阅读 · 2020年12月4日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

CVPR2019| 05-16更新10篇论文及代码合集（含一篇oral，全景分割/文本检测/目标检测等）

CVPR2019| 05-16更新10篇论文及代码合集（含一篇oral，全景分割/文本检测/目标检测等）

极市平台

14+阅读 · 2019年5月16日

弱监督语义分割最新方法资源列表

弱监督语义分割最新方法资源列表

专知

9+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

隔离型双向DC/DC变换器：拓扑结构与控制方法

隔离型双向DC/DC变换器：拓扑结构与控制方法

NE电气

4+阅读 · 2018年4月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

极市平台

5+阅读 · 2017年6月15日

相关论文

Improving robustness against common corruptions with frequency biased models

Arxiv

0+阅读 · 2021年3月30日

Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

Arxiv

0+阅读 · 2021年3月29日

Teacher-Student Training for Robust Tacotron-based TTS

Teacher-Student Training for Robust Tacotron-based TTS

Arxiv

5+阅读 · 2019年11月7日

Fast AutoAugment

Fast AutoAugment

Arxiv

5+阅读 · 2019年5月1日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

An End-to-End Baseline for Video Captioning

Arxiv

6+阅读 · 2019年4月4日

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Arxiv

9+阅读 · 2018年9月17日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

A Robust Real-Time Automatic License Plate Recognition based on the YOLO Detector

Arxiv

13+阅读 · 2018年3月1日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

微信扫码咨询专知VIP会员