无人监督的关于德拉维德语言的机器翻译 (Unsupervised Machine Translation On Dravidian Languages)

Unsupervised neural machine translation (UNMT) is beneficial especially for low resource languages such as those from the Dravidian family. However, UNMT systems tend to fail in realistic scenarios involving actual low resource languages. Recent works propose to utilize auxiliary parallel data and have achieved state-of-the-art results. In this work, we focus on unsupervised translation between English and Kannada, a low resource Dravidian language. We additionally utilize a limited amount of auxiliary data between English and other related Dravidian languages. We show that unifying the writing systems is essential in unsupervised translation between the Dravidian languages. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for distant language pairs. Our experiments demonstrate that it is crucial to include auxiliary languages that are similar to our focal language, Kannada. Furthermore, we propose a metric to measure language similarity and show that it serves as a good indicator for selecting the auxiliary languages.

翻译：无人监督的神经机器翻译(UNMT)尤其有利于诸如Dravidian家族的低资源语言等低资源语言。然而,UNMT系统在实际使用低资源语言的现实情景中往往会失败。最近的工作提议使用辅助平行数据,并取得了最先进的成果。在这项工作中,我们侧重于英语和Kannada(一种低资源Dravidian语言)之间未经监督的翻译。我们另外利用了英语和其他相关的Dravidian语言之间数量有限的辅助数据。我们表明,在Dravidian语言之间未经监督的翻译中,统一书写系统是必不可少的。我们探索了使用辅助数据的一些模型结构,以便最大限度地分享知识并使UNMT能够为远语言配对。我们的实验表明,必须包括类似于我们联系语言Kannada的辅助语言。此外,我们提出了衡量语言相似性并显示它作为选择辅助语言的良好指标的衡量标准。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

18+阅读 · 2020年11月17日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日