双文本编辑:改进低资源机器翻译的自动双文本编辑 (BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine Translation) - 专知论文

会员服务 ·

0

Machine Translation · MoDELS · MINE · CASES · BLEU ·

2022 年 5 月 30 日

BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine Translation

翻译：双文本编辑:改进低资源机器翻译的自动双文本编辑

Eleftheria Briakou,Sida I. Wang,Luke Zettlemoyer,Marjan Ghazvininejad

Mined bitexts can contain imperfect translations that yield unreliable training signals for Neural Machine Translation (NMT). While filtering such pairs out is known to improve final model quality, we argue that it is suboptimal in low-resource conditions where even mined data can be limited. In our work, we propose instead, to refine the mined bitexts via automatic editing: given a sentence in a language xf, and a possibly imperfect translation of it xe, our model generates a revised version xf' or xe' that yields a more equivalent translation pair (i.e., <xf, xe'> or <xf', xe>). We use a simple editing strategy by (1) mining potentially imperfect translations for each sentence in a given bitext, (2) learning a model to reconstruct the original translations and translate, in a multi-task fashion. Experiments demonstrate that our approach successfully improves the quality of CCMatrix mined bitext for 5 low-resource language-pairs and 10 translation directions by up to ~ 8 BLEU points, in most cases improving upon a competitive back-translation baseline.

翻译：被开采的位元体可以包含不完善的翻译,为神经机器翻译(NMT)产生不可靠的培训信号。虽然过滤这些配对可以提高最终模型质量,但我们认为,在低资源条件下,即使雷区数据也受到限制,这是不理想的。在我们的工作中,我们提议通过自动编辑来改进被开采的位元体:用一种语言xf给一个句子,并且可能不完美的翻译 xe,我们的模型产生一个修订版 xf 或 xe,产生一个更等效的翻译配对(即 < xf, xe 或 < xf', xe )。我们使用简单的编辑战略, (1) 在给定的位数中挖掘每个句子的潜在不完善的翻译, (2) 学习一个模型来重建原始翻译,并以多种方式翻译。实验表明,我们的方法成功地提高了5种低资源语言版面和10个翻译方向的CMatrix比特的质量, 最高可达~ 8 BLEU 点。

0

相关内容

Machine Translation

Machine Translation

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

基于自噬系统mTOR信号通路探讨扶正祛邪中药小复方干预阿尔茨海默病模型的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

Toll样受体在中药成分保护肠黏膜微血管内皮细胞免受细菌毒素损伤中的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK调控内质网应激抵抗COPD气道上皮细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

长非编码RNA参与烟曲霉锌离子摄取机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Rictor调控内皮细胞功能及衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

DegP (HtrA)的蛋白酶与分子伴侣活性之间功能转变的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

小胶质细胞失稳态在光损伤视网膜变性中作用研究

国家自然科学基金

0+阅读 · 2008年12月31日

Amortized Noisy Channel Neural Machine Translation

Arxiv

0+阅读 · 2022年7月18日

Action-based Contrastive Learning for Trajectory Prediction

Arxiv

0+阅读 · 2022年7月18日

PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching

Arxiv

0+阅读 · 2022年7月16日

HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2022年7月15日

Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting

Arxiv

0+阅读 · 2022年7月15日

Current Trends in Deep Learning for Earth Observation: An Open-source Benchmark Arena for Image Classification

Arxiv

0+阅读 · 2022年7月14日

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Arxiv

0+阅读 · 2022年7月14日

Iterative training of robust k-space interpolation networks for improved image reconstruction with limited scan specific training samples

Arxiv

0+阅读 · 2022年7月14日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Entity Context and Relational Paths for Knowledge Graph Completion

Arxiv

29+阅读 · 2020年2月17日

VIP会员

文章信息

相关主题

Machine Translation

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

《商用大语言模型的升级风险管理：国家安全运用》

自主人工智能：未来战争是否将是自主化的？

《从装备到文化：美陆军技术素养建设启示录》最新报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Amortized Noisy Channel Neural Machine Translation

Arxiv

0+阅读 · 2022年7月18日

Action-based Contrastive Learning for Trajectory Prediction

Arxiv

0+阅读 · 2022年7月18日

PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching

Arxiv

0+阅读 · 2022年7月16日

HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2022年7月15日

Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting

Arxiv

0+阅读 · 2022年7月15日

Current Trends in Deep Learning for Earth Observation: An Open-source Benchmark Arena for Image Classification

Arxiv

0+阅读 · 2022年7月14日

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Arxiv

0+阅读 · 2022年7月14日

Iterative training of robust k-space interpolation networks for improved image reconstruction with limited scan specific training samples

Arxiv

0+阅读 · 2022年7月14日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Entity Context and Relational Paths for Knowledge Graph Completion

Arxiv

29+阅读 · 2020年2月17日

相关基金

基于自噬系统mTOR信号通路探讨扶正祛邪中药小复方干预阿尔茨海默病模型的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

Toll样受体在中药成分保护肠黏膜微血管内皮细胞免受细菌毒素损伤中的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK调控内质网应激抵抗COPD气道上皮细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

长非编码RNA参与烟曲霉锌离子摄取机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Rictor调控内皮细胞功能及衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

DegP (HtrA)的蛋白酶与分子伴侣活性之间功能转变的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

小胶质细胞失稳态在光损伤视网膜变性中作用研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员