This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it also enables new types of transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.
翻译:本文表明,多语言分解培训前的训练在各种机器翻译(MT)任务中产生了显著的业绩收益。我们介绍了MBART -- -- 一个使用BART目标以多种语言对大型单一语言公司进行预先训练的MBART -- -- 一个从顺序到顺序的自动编码器,使用BART目标对许多语言的大型单一语言公司进行了预先培训。MBART是培训一个完整序列到顺序的模式的第一批方法之一,它用多种语言拆分完整文本,而以前的做法只侧重于文本的编码器、解译器或部分重建。我们还表明,通过培训一个完整的模型,可以直接微调该模型,以监督(在判决和文件级别上)和非监督的机器翻译,而没有具体任务上的修改。我们证明,加上MBART的初始化在全部但最高资源环境下都产生业绩收益,包括低资源MT的12个BLEU点,以及许多文件级别和不受监督的模式的5个BLEU点以上。我们还表明,它也使得能够向语言配对新类型的转换,没有双文本或最广泛的前分析要素或最没有结果。