Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora, and do not make use of the strong cross-lingual signal contained in parallel data. In this paper, we present PARADISE (PARAllel & Denoising Integration in SEquence-to-sequence models), which extends the conventional denoising objective used to train these models by (i) replacing words in the noised sequence according to a multilingual dictionary, and (ii) predicting the reference translation according to a parallel corpus instead of recovering the original sequence. Our experiments on machine translation and cross-lingual natural language inference show an average improvement of 2.0 BLEU points and 6.7 accuracy points from integrating parallel data into pretraining, respectively, obtaining results that are competitive with several popular models at a fraction of their computational cost.
翻译:尽管多语种顺序到顺序的预培训取得了成功,但大多数现有办法都依赖单一语言的组合,没有利用平行数据中所包含的强有力的跨语言信号。本文介绍PARADISE(PARALEL 和代言融合在顺序到顺序模型中),这扩大了用于培训这些模型的常规分层目标,即(一) 按照多语种词典替换新顺序中的单词,以及(二) 根据平行材料预测参考翻译,而不是恢复原有顺序。 我们在机器翻译和跨语言自然语言的推论方面的实验显示,从将平行数据纳入预培训中,平均改进了2.0 BLEU点和6.7个精度点,以计算成本的一小部分与几个流行模型相比具有竞争力的结果。