This paper proposes a novel multilingual multistage fine-tuning approach for low-resource neural machine translation (NMT), taking a challenging Japanese--Russian pair for benchmarking. Although there are many solutions for low-resource scenarios, such as multilingual NMT and back-translation, we have empirically confirmed their limited success when restricted to in-domain data. We therefore propose to exploit out-of-domain data through transfer learning, by using it to first train a multilingual NMT model followed by multistage fine-tuning on in-domain parallel and back-translated pseudo-parallel data. Our approach, which combines domain adaptation, multilingualism, and back-translation, helps improve the translation quality by more than 3.7 BLEU points, over a strong baseline, for this extremely low-resource scenario.
翻译:本文建议对低资源神经机器翻译采用新的多语种多阶段微调方法(NMT ), 采用具有挑战性的日-俄对比基准。 虽然对低资源情景有许多解决方案,例如多语言NMT和反翻译,但我们在经验上证实,在仅限于主数据时,这些解决方案的收效有限。 因此,我们提议通过转让学习来利用外向数据,首先培训多语言的NMT模型,然后对内向平行和反向翻译的伪平行数据进行多阶段微调。 我们的方法将领域适应、多语种和反翻译结合起来,帮助提高翻译质量,超过3.7个BLEU点,超过这一极低资源情景的强基线。