While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem. Directly stacking more blocks to the NMT model results in no improvement and even reduces performance. In this work, we propose an effective two-stage approach with three specially designed components to construct deeper NMT models, which result in significant improvements over the strong Transformer baselines on WMT$14$ English$\to$German and English$\to$French translation tasks\footnote{Our code is available at \url{https://github.com/apeterswu/Depth_Growing_NMT}}.
翻译:虽然非常深的神经网络显示计算机视野和文本分类应用的有效性,但如何提高神经机器翻译模型网络深度以提高翻译质量仍是一个棘手问题。 直接堆叠更多NMT模型块的结果没有改进甚至降低性能。 在这项工作中,我们建议采取有效的两阶段办法,包括三个专门设计的构筑更深NMT模型的部件,从而大大改进了1 400美元德国元和1 400美元法国元的强大变压器基线。 我们的代码可在以下网址查阅:\url{https://github.com/apeterswu/Depth_Growing_NMT}。