NUQSGD:通过非统一量化,可确保通信效率高的数据平行 SGD (NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization) - 专知论文

会员服务 ·

0

SGD · Performer · 可约的 · 随机梯度下降 · MoDELS ·

2021 年 5 月 3 日

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

翻译：NUQSGD:通过非统一量化,可确保通信效率高的数据平行 SGD

Ali Ramezani-Kebrya,Fartash Faghri,Ilya Markov,Vitalii Aksenov,Dan Alistarh,Daniel M. Roy

from arxiv, 42 pages, 21 figures. To appear in the Journal of Machine Learning Research (JMLR)

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods.

翻译：随着模型和数据集的规模和复杂性的扩大和复杂性的提高,对可用于进行平行模式培训的通信效率高的梯度下降变体的需要也随之增加。数据平行SGD的一种流行的通信压缩法是QSGD(Alistrah等人,2017年),它量化和编码梯度,以减少通信成本。QSGD的基线变体提供了强有力的理论保障,然而,出于实际目的,作者们提出了一种我们称之为QSGDinf的超常变体,它显示了在大型神经网络的分布培训中所取得的令人印象深刻的经验性收益。在本文件中,我们以这项工作为基础提出一个新的梯度四分化计划,并表明它既有比QSGD的更强大的理论保障,而且与QSGDinf的超自然法和其他压缩方法的实验性能相匹配和超过。

0

相关内容

SGD

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

CVer

21+阅读 · 2020年6月20日

ICLR 2020会议的16篇最佳深度学习论文

ICLR 2020会议的16篇最佳深度学习论文

AINLP

5+阅读 · 2020年5月12日

深度学习模型剪枝：Slimmable Networks三部曲

深度学习模型剪枝：Slimmable Networks三部曲

极市平台

3+阅读 · 2020年2月22日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

神经网络训练tricks

神经网络训练tricks

极市平台

6+阅读 · 2019年4月15日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Variational Bayes method for ordinary differential equation models

Arxiv

0+阅读 · 2021年8月8日

Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications

Arxiv

0+阅读 · 2021年8月8日

Central limit theorems for high dimensional dependent data

Arxiv

0+阅读 · 2021年8月7日

The Faddeev-LeVerrier algorithm and the Pfaffian

Arxiv

0+阅读 · 2021年8月5日

ResMLP: Feedforward networks for image classification with data-efficient training

Arxiv

12+阅读 · 2021年5月7日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server

Arxiv

4+阅读 · 2018年4月22日

VIP会员

文章信息

相关主题

随机梯度下降

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

CVer

21+阅读 · 2020年6月20日

ICLR 2020会议的16篇最佳深度学习论文

ICLR 2020会议的16篇最佳深度学习论文

AINLP

5+阅读 · 2020年5月12日

深度学习模型剪枝：Slimmable Networks三部曲

深度学习模型剪枝：Slimmable Networks三部曲

极市平台

3+阅读 · 2020年2月22日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

神经网络训练tricks

神经网络训练tricks

极市平台

6+阅读 · 2019年4月15日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Variational Bayes method for ordinary differential equation models

Arxiv

0+阅读 · 2021年8月8日

Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications

Arxiv

0+阅读 · 2021年8月8日

Central limit theorems for high dimensional dependent data

Arxiv

0+阅读 · 2021年8月7日

The Faddeev-LeVerrier algorithm and the Pfaffian

Arxiv

0+阅读 · 2021年8月5日

ResMLP: Feedforward networks for image classification with data-efficient training

Arxiv

12+阅读 · 2021年5月7日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server

Arxiv

4+阅读 · 2018年4月22日

微信扫码咨询专知VIP会员