利用零光多语多语多语机翻译技术解决语音翻译数据短缺问题 (Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques) - 专知论文

会员服务 ·

0

语音翻译 · Machine Translation · BLEU · MoDELS · 有向 ·

2022 年 1 月 26 日

Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques

翻译：利用零光多语多语多语机翻译技术解决语音翻译数据短缺问题

Tu Anh Dinh,Danni Liu,Jan Niehues

from arxiv, 6 pages, 5 figures, accepted to IEEE ICASSP 2022. arXiv admin note: text overlap with arXiv:2107.06010

Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error propagation. However, the approach suffers from data scarcity. It heavily depends on direct ST data and is less efficient in making use of speech transcription and text translation data, which is often more easily available. In the related field of multilingual text translation, several techniques have been proposed for zero-shot translation. A main idea is to increase the similarity of semantically similar sentences in different languages. We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data. We investigate the effects of data augmentation and auxiliary loss function. The techniques were successfully applied to few-shot ST using limited ST data, with improvements of up to +12.9 BLEU points compared to direct end-to-end ST and +3.1 BLEU points compared to ST models fine-tuned from ASR model.

翻译：最近,端对端语音翻译(ST)因避免错误传播而引起极大关注,然而,这一方法缺乏数据,严重依赖直接ST数据,在使用通常比较容易获得的语音转录和文本翻译数据方面效率较低;在多语种文本翻译相关领域,提议采用若干技术进行零发翻译;一个主要想法是增加不同语言的语义相似句子的相似性;我们研究这些想法是否可以适用于语言翻译,方法是建立经语言转录和文本翻译数据培训的ST模型;我们调查数据增强和附带损失功能的影响;利用有限的ST数据成功地将这些技术应用于少发的ST,与直接端对端ST和+3.1 BLEU点数相比,改进幅度高达+12.9 BLEU点数,而ST模型则从ASR模型进行精细调整。

0

相关内容

语音翻译

通过计算机进行不同语言之间的直接语音翻译，辅助不同语言背景的人们进行沟通已经成为世界各国研究的重点。和一般的文本翻译不同，语音翻译需要把语音识别、机器翻译和语音合成三大技术进行集成，具有很大的挑战性。

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

122+阅读 · 2020年7月18日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

18+阅读 · 2020年4月25日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

35+阅读 · 2020年3月3日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

75+阅读 · 2020年2月8日

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

专知会员服务

34+阅读 · 2019年12月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

2+阅读 · 2021年12月20日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

氧化应激相关蛋白p66Shc和GDF1在砷介导的心脏毒性中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向微博的实时事件深度挖掘研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于神经网络的新疆棉花秸秆高效酶解糖化研究

国家自然科学基金

0+阅读 · 2014年12月31日

泛在网络环境下用户兴趣建模与移动推荐方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于概率化SC文法的多策略机器翻译研究

国家自然科学基金

0+阅读 · 2012年12月31日

白桦FT及SOC1基因的RNAi研究

国家自然科学基金

0+阅读 · 2009年12月31日

多因素综合作用下的血糖变化趋势判定研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

负载不同蛔虫抗原的DC影响调节性T细胞反应性的比较

国家自然科学基金

0+阅读 · 2009年12月31日

Active Few-Shot Learning with FASL

Arxiv

0+阅读 · 2022年4月20日

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Arxiv

0+阅读 · 2022年4月20日

DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation

Arxiv

0+阅读 · 2022年4月20日

On the Locality of Attention in Direct Speech Translation

Arxiv

0+阅读 · 2022年4月19日

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

Arxiv

0+阅读 · 2022年4月19日

MDQE: A More Accurate Direct Pretraining for Machine Translation Quality Estimation

Arxiv

0+阅读 · 2022年4月18日

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

Arxiv

1+阅读 · 2022年4月16日

Consecutive Decoding for Speech-to-text Translation

Arxiv

0+阅读 · 2022年4月15日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

A Survey of Domain Adaptation for Neural Machine Translation

Arxiv

17+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

Machine Translation

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

122+阅读 · 2020年7月18日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

18+阅读 · 2020年4月25日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

35+阅读 · 2020年3月3日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

75+阅读 · 2020年2月8日

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

专知会员服务

34+阅读 · 2019年12月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

热门VIP内容

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

2+阅读 · 2021年12月20日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

相关论文

Active Few-Shot Learning with FASL

Arxiv

0+阅读 · 2022年4月20日

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Arxiv

0+阅读 · 2022年4月20日

DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation

Arxiv

0+阅读 · 2022年4月20日

On the Locality of Attention in Direct Speech Translation

Arxiv

0+阅读 · 2022年4月19日

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

Arxiv

0+阅读 · 2022年4月19日

MDQE: A More Accurate Direct Pretraining for Machine Translation Quality Estimation

Arxiv

0+阅读 · 2022年4月18日

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

Arxiv

1+阅读 · 2022年4月16日

Consecutive Decoding for Speech-to-text Translation

Arxiv

0+阅读 · 2022年4月15日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

A Survey of Domain Adaptation for Neural Machine Translation

Arxiv

17+阅读 · 2018年6月1日

相关基金

氧化应激相关蛋白p66Shc和GDF1在砷介导的心脏毒性中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向微博的实时事件深度挖掘研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于神经网络的新疆棉花秸秆高效酶解糖化研究

国家自然科学基金

0+阅读 · 2014年12月31日

泛在网络环境下用户兴趣建模与移动推荐方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于概率化SC文法的多策略机器翻译研究

国家自然科学基金

0+阅读 · 2012年12月31日

白桦FT及SOC1基因的RNAi研究

国家自然科学基金

0+阅读 · 2009年12月31日

多因素综合作用下的血糖变化趋势判定研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

负载不同蛔虫抗原的DC影响调节性T细胞反应性的比较

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员