消减光度假设-粗粗制色标:提高记忆效率和多指标类推累积神经机器翻译模型平行培训期间多指标类推和多指标类推集体绩效 (Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models)

Machine Translation · Performer · MoDELS · 缩放 · 可约的 ·

2019 年 5 月 10 日

Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models

翻译：消减光度假设-粗粗制色标:提高记忆效率和多指标类推累积神经机器翻译模型平行培训期间多指标类推和多指标类推集体绩效

Derya Cavdar,Valeriu Codreanu,Can Karakus,John A. Lockman III,Damian Podareanu,Vikram Saletore,Alexander Sergeev,Don D. Smith II,Victor Suthichai,Quy Ta,Srinivas Varadharajan,Lucas A. Wilson,Rengan Xu,Pei Yang

from arxiv, 18 pages, 10 figures, accepted at the 2019 International Supercomputing Conference

Neural machine translation - using neural networks to translate human language - is an area of active research exploring new neuron types and network topologies with the goal of dramatically improving machine translation performance. Current state-of-the-art approaches, such as the multi-head attention-based transformer, require very large translation corpuses and many epochs to produce models of reasonable quality. Recent attempts to parallelize the official TensorFlow "Transformer" model across multiple nodes have hit roadblocks due to excessive memory use and resulting out of memory errors when performing MPI collectives. This paper describes modifications made to the Horovod MPI-based distributed training framework to reduce memory usage for transformer models by converting assumed-sparse tensors to dense tensors, and subsequently replacing sparse gradient gather with dense gradient reduction. The result is a dramatic increase in scale-out capability, with CPU-only scaling tests achieving 91% weak scaling efficiency up to 1200 MPI processes (300 nodes), and up to 65% strong scaling efficiency up to 400 MPI processes (200 nodes) using the Stampede2 supercomputer.

翻译：神经机器翻译 — 使用神经网络翻译人文语言 — 是一个积极研究的领域,探索新的神经型和网络型态,以大幅提高机器翻译性能为目标。目前最先进的方法,如多头关注型变压器,需要非常庞大的翻译机和许多时代才能产生质量合理的模型。最近在多个节点上将官方TensorFlow“ Transfor”模型平行化的尝试由于过度使用记忆和在进行 MPI 集成时出现记忆错误而遇到了路障。本文描述了对基于 Horovod MPI 的分布式培训框架所作的修改, 以减少变压器模型的记忆用量, 其方法是将假设的发压器转换为密度高的发压器, 并随后以密度梯度减低的方式取代稀释的稀释梯度。其结果是, 升级能力大幅提升, 仅使用CPU的升级测试达到91%的微缩缩缩缩缩到 1200 MPI 进程( 300 节点), 并高达65% 将效率大幅提升到400 MPI 进程 (200 节点) 。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

85+阅读 · 2020年5月11日

专知会员服务

164+阅读 · 2020年5月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

Transformer文本分类代码

专知会员服务

116+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models