Although the multilingual Neural Machine Translation(NMT), which extends Google's multilingual NMT, has ability to perform zero-shot translation and the iterative self-learning algorithm can improve the quality of zero-shot translation, it confronts with two problems: the multilingual NMT model is prone to generate wrong target language when implementing zero-shot translation; the self-learning algorithm, which uses beam search to generate synthetic parallel data, demolishes the diversity of the generated source language and amplifies the impact of the same noise during the iterative learning process. In this paper, we propose the tagged-multilingual NMT model and improve the self-learning algorithm to handle these two problems. Firstly, we extend the Google's multilingual NMT model and add target tokens to the target languages, which associates the start tag with the target language to ensure that the source language can be translated to the required target language. Secondly, we improve the self-learning algorithm by replacing beam search with random sample to increases the diversity of the generated data and makes it properly cover the true data distribution. Experimental results on IWSLT show that the adjusted tagged-multilingual NMT separately obtains 9.41 and 7.85 BLEU scores over the multilingual NMT on 2010 and 2017 Romanian-Italian test sets. Similarly, it obtains 9.08 and 7.99 BLEU scores on Italian-Romanian zero-shot translation. Furthermore, the improved self-learning algorithm shows its superiorities over the conventional self-learning algorithm on zero-shot translations.
翻译:虽然谷歌多语种NMT的多语言神经机(NMT)翻译(NMT)是谷歌多语种的多语言神经机翻译(NMT)的延伸,它有能力进行零点翻译,而且迭代自学算法可以提高零点翻译的质量,但它面临两个问题:多语种NMT模式在实施零点翻译时容易产生错误的目标语言;自学算法,它使用光子搜索生成合成平行数据,摧毁源语言的多样性,扩大迭接学习过程中同样噪音的影响。在本文中,我们提议加贴标签的多语种NMT模式,改进处理这两个问题的自学算法。首先,我们扩大谷歌多语种NMT模式,并将目标符号添加到目标语言上,将起始标记与目标语言联系起来,以确保源语言能够翻译到所要求的目标语言。第二,我们改进自学算算法,用随机抽样取代了线搜索,以增加生成数据的多样性,使其适当覆盖真实的数据分发。IWSLLTT的实验结果显示,经过调整的加固多种语种NMT模式NMT模式,在2010年的意大利标准化测试中获得了2010年版和2010年版的BLOrBMT。