BaPipe:为DNN培训探索平衡管道平行主义 (BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training) - 专知论文

会员服务 ·

0

DNN · 前向传播/正向传播 · Networking · Performer · 模型并行 ·

2021 年 1 月 14 日

BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training

翻译：BaPipe:为DNN培训探索平衡管道平行主义

Letian Zhao,Rui Xu,Tianqi Wang,Teng Tian,Xiaotian Wang,Wei Wu,Chio-in Ieong,Xi Jin

The size of deep neural networks (DNNs) grows rapidly as the complexity of the machine learning algorithm increases. To satisfy the requirement of computation and memory of DNN training, distributed deep learning based on model parallelism has been widely recognized. We propose a new pipeline parallelism training framework, BaPipe, which can automatically explore pipeline parallelism training methods and balanced partition strategies for DNN distributed training. In BaPipe, each accelerator calculates the forward propagation and backward propagation of different parts of networks to implement the intra-batch pipeline parallelism strategy. BaPipe uses a new load balancing automatic exploration strategy that considers the parameters of DNN models and the computation, memory, and communication resources of accelerator clusters. We have trained different DNNs such as VGG-16, ResNet-50, and GNMT on GPU clusters and simulated the performance of different FPGA clusters. Compared with state-of-the-art data parallelism and pipeline parallelism frameworks, BaPipe provides up to 3.2x speedup and 4x memory reduction in various platforms.

翻译：随着机器学习算法的复杂程度的提高,深神经网络的规模迅速扩大。为了满足对DNN培训的计算和记忆要求,基于模型平行主义的分散深层次学习得到广泛承认。我们提议一个新的管道平行培训框架,即巴皮佩,这个框架可以自动探索管道平行培训方法和DNN分布培训的平衡分割战略。在巴皮佩,每个加速器计算出网络不同部分的前沿传播和后向传播,以实施分包管道平行战略。巴皮佩使用一种新的负载平衡自动探索战略,考虑DNN模型的参数和加速器集群的计算、记忆和通信资源。我们培训了不同的DNN,如VGG-16、ResNet-50和GNMT,关于GPU的集群,并模拟了不同的FPGA集群的性能。与最新数据平行和管道平行框架相比,巴皮佩提供了不同平台上3.2x速度和4x记忆减少。

0

相关内容

DNN

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

44+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

46+阅读 · 2020年7月4日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

13+阅读 · 2020年5月19日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

106+阅读 · 2020年5月15日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

105+阅读 · 2020年5月3日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

30+阅读 · 2020年4月1日

简明扼要！Python教程手册，206页pdf

简明扼要！Python教程手册，206页pdf

专知会员服务

46+阅读 · 2020年3月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

CVer

21+阅读 · 2020年6月20日

人工智能 | ACCV 2020等国际会议信息5条

人工智能 | ACCV 2020等国际会议信息5条

Call4Papers

6+阅读 · 2019年6月21日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

8+阅读 · 2018年12月28日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

10+阅读 · 2018年12月4日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

已删除

将门创投

5+阅读 · 2018年10月16日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Fast and Accurate Model Scaling

Arxiv

0+阅读 · 2021年3月11日

UAV Path Planning using Global and Local Map Information with Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年3月10日

Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware

Arxiv

0+阅读 · 2021年3月10日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Arxiv

3+阅读 · 2018年11月20日

A Framework of Transfer Learning in Object Detection for Embedded Systems

Arxiv

3+阅读 · 2018年11月12日

Speeding-up Object Detection Training for Robotics with FALKON

Speeding-up Object Detection Training for Robotics with FALKON

Arxiv

6+阅读 · 2018年8月27日

TBD: Benchmarking and Analyzing Deep Neural Network Training

Arxiv

3+阅读 · 2018年3月16日

SEARNN: Training RNNs with Global-Local Losses

Arxiv

4+阅读 · 2018年1月29日

MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning

Arxiv

4+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

前向传播/正向传播

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

44+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

46+阅读 · 2020年7月4日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

13+阅读 · 2020年5月19日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

106+阅读 · 2020年5月15日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

105+阅读 · 2020年5月3日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

30+阅读 · 2020年4月1日

简明扼要！Python教程手册，206页pdf

简明扼要！Python教程手册，206页pdf

专知会员服务

46+阅读 · 2020年3月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

CVer

21+阅读 · 2020年6月20日

人工智能 | ACCV 2020等国际会议信息5条

人工智能 | ACCV 2020等国际会议信息5条

Call4Papers

6+阅读 · 2019年6月21日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

8+阅读 · 2018年12月28日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

10+阅读 · 2018年12月4日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

已删除

将门创投

5+阅读 · 2018年10月16日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

相关论文

Fast and Accurate Model Scaling

Arxiv

0+阅读 · 2021年3月11日

UAV Path Planning using Global and Local Map Information with Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年3月10日

Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware

Arxiv

0+阅读 · 2021年3月10日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Arxiv

3+阅读 · 2018年11月20日

A Framework of Transfer Learning in Object Detection for Embedded Systems

Arxiv

3+阅读 · 2018年11月12日

Speeding-up Object Detection Training for Robotics with FALKON

Speeding-up Object Detection Training for Robotics with FALKON

Arxiv

6+阅读 · 2018年8月27日

TBD: Benchmarking and Analyzing Deep Neural Network Training

Arxiv

3+阅读 · 2018年3月16日

SEARNN: Training RNNs with Global-Local Losses

Arxiv

4+阅读 · 2018年1月29日

MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning

Arxiv

4+阅读 · 2018年1月11日

微信扫码咨询专知VIP会员