神经机器翻译可缩放变换器 (Scalable Transformers for Neural Machine Translation)

Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. However, the deployment of Transformer is challenging because different scenarios require models of different complexities and scales. Naively training multiple Transformers is redundant in terms of both computation and memory. In this paper, we propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters. Each sub-Transformer can be easily obtained by cropping the parameters of the largest Transformer. A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers, which introduces additional supervisions from word-level and sequence-level self-distillation. Extensive experiments were conducted on WMT EN-De and En-Fr to validate our proposed scalable Transformers.

翻译：神经机器翻译(NMT)广泛采用变异器,因为其容量巨大,并同时对序列生成进行了培训;然而,由于不同的情景需要不同复杂程度和规模的模型,因此变异器的部署具有挑战性,因为不同的情景需要不同的模型。在计算和记忆方面,对多种变异器的培训是多余的。在本文中,我们提议了一个新的可缩放变异器,它自然包含不同规模的子转换器,并具有共同参数。每个子变异器都可以通过绘制最大变异器的参数很容易获得。提议了一个三阶段培训计划,以解决培训可缩放变异器的困难,该变异器从字级和序列级的自我蒸馏中引入额外的监督。在WMT ENDE和 En-Fr上进行了广泛的实验,以验证我们提议的变异器。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【AAAI2021】生成式Transformer的对比三元组提取

专知会员服务

48+阅读 · 2021年2月7日

最新《Transformers模型》教程，64页ppt

专知会员服务

276+阅读 · 2020年11月26日

【伯克利】黑盒机器翻译系统的模仿攻击与防御，Imitation Attacks and Defenses for Black-box Machine Translation Systems

专知会员服务

6+阅读 · 2020年5月4日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

18+阅读 · 2020年4月25日