在不平衡的培训数据假设中进行不受监督神经机器翻译的自我培训 (Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios)

Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks. However, in real-world scenarios, massive monolingual corpora do not exist for some extremely low-resource languages such as Estonian, and UNMT systems usually perform poorly when there is not adequate training corpus for one language. In this paper, we first define and analyze the unbalanced training data scenario for UNMT. Based on this scenario, we propose UNMT self-training mechanisms to train a robust UNMT system and improve its performance in this case. Experimental results on several language pairs show that the proposed methods substantially outperform conventional UNMT systems.

翻译：完全依赖大规模单一语言翻译的不受监督神经机器翻译(UNMT)在一些翻译任务中取得了显著成果,然而,在现实世界中,爱沙尼亚等一些极低资源语言并不存在大规模的单一语言翻译,而联合国MT系统在缺乏一种语言的足够培训资料时通常表现不佳。在本文件中,我们首先界定和分析联合国MTT的不平衡培训数据假设。基于这一假设,我们提议联合国MT的自我培训机制来培训一个强有力的UNMT系统,并改进这一情况下的绩效。几个语言配对的实验结果表明,拟议的方法大大优于传统的UNMT系统。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

18+阅读 · 2020年11月17日

【ICML2020】文本摘要生成模型PEGASUS

专知会员服务

35+阅读 · 2020年8月23日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日