大型批量优化器优化度现实检查: 传统的通用优化器, 横跨批量大小 (A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes) - 专知论文

会员服务 ·

0

优化器 · Adam · Nesterov动量法 · Neural Networks · 动量 ·

2021 年 6 月 9 日

A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

翻译：大型批量优化器优化度现实检查: 传统的通用优化器, 横跨批量大小

Zachary Nado,Justin M. Gilmer,Christopher J. Shallue,Rohan Anil,George E. Dahl

Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update rules of Heavy-ball momentum and Adam, respectively, and have become popular in prominent benchmarks and deep learning libraries. However, without fair comparisons to standard optimizers, it remains an open question whether LARS and LAMB have any benefit over traditional, generic algorithms. In this work we demonstrate that standard optimization algorithms such as Nesterov momentum and Adam can match or exceed the results of LARS and LAMB at large batch sizes. Our results establish new, stronger baselines for future comparisons at these batch sizes and shed light on the difficulties of comparing optimizers for neural network training more generally.

翻译：最近,LARS和LAMB的优化软件被提议用于使用大批量尺寸更快地培训神经网络。LAMB和LAMB分别为重球动力和亚当的最新规则增添了分层正常化,并成为著名基准和深层学习图书馆的流行对象。然而,如果不与标准优化软件进行公平比较,LAMB和LAMB是否对传统的通用算法有任何好处仍是一个未决问题。在这项工作中,我们证明Nesterov动力和Adam等标准优化算法可以匹配或超过LARS和LAMB的大批量尺寸结果。我们的结果为今后在这类批量尺寸上进行比较建立了新的、更强大的基线,并揭示了比较神经网络培训优化软件的困难。

0

相关内容

优化器

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图神经网络基准，37页ppt，NTU Chaitanya Joshi

图神经网络基准，37页ppt，NTU Chaitanya Joshi

专知会员服务

24+阅读 · 2020年8月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

还在修改博士论文？这份《博士论文写作技巧》为你指南

还在修改博士论文？这份《博士论文写作技巧》为你指南

专知会员服务

165+阅读 · 2020年6月9日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

使用 Keras Tuner 调节超参数

使用 Keras Tuner 调节超参数

TensorFlow

15+阅读 · 2020年2月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Conditional Batch Normalization 详解

Conditional Batch Normalization 详解

极市平台

4+阅读 · 2019年4月12日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

机器学习研究会

5+阅读 · 2017年9月5日

Protocol-based Smart Contract Generation

Protocol-based Smart Contract Generation

Arxiv

0+阅读 · 2021年8月5日

Batch Normalization Preconditioning for Neural Network Training

Arxiv

0+阅读 · 2021年8月2日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Arxiv

3+阅读 · 2021年3月5日

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Arxiv

14+阅读 · 2021年2月16日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

8+阅读 · 2019年5月20日

Quantum generative adversarial networks

Arxiv

4+阅读 · 2018年4月30日

VIP会员

文章信息

相关主题

Nesterov动量法

Neural Networks

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图神经网络基准，37页ppt，NTU Chaitanya Joshi

图神经网络基准，37页ppt，NTU Chaitanya Joshi

专知会员服务

24+阅读 · 2020年8月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

还在修改博士论文？这份《博士论文写作技巧》为你指南

还在修改博士论文？这份《博士论文写作技巧》为你指南

专知会员服务

165+阅读 · 2020年6月9日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

使用 Keras Tuner 调节超参数

使用 Keras Tuner 调节超参数

TensorFlow

15+阅读 · 2020年2月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Conditional Batch Normalization 详解

Conditional Batch Normalization 详解

极市平台

4+阅读 · 2019年4月12日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

机器学习研究会

5+阅读 · 2017年9月5日

相关论文

Protocol-based Smart Contract Generation

Protocol-based Smart Contract Generation

Arxiv

0+阅读 · 2021年8月5日

Batch Normalization Preconditioning for Neural Network Training

Arxiv

0+阅读 · 2021年8月2日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Arxiv

3+阅读 · 2021年3月5日

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Arxiv

14+阅读 · 2021年2月16日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

8+阅读 · 2019年5月20日

Quantum generative adversarial networks

Arxiv

4+阅读 · 2018年4月30日

微信扫码咨询专知VIP会员