亚当解剖: 沙粒梯度的符号、磁度和变化 (Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients) - 专知论文

会员服务 ·

0

Adam · 方差 · 泛化理论 · 估计/估计量 · Weight ·

2020 年 12 月 13 日

Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients

翻译：亚当解剖: 沙粒梯度的符号、磁度和变化

Lukas Balles,Philipp Hennig

from arxiv, Presented at the 35th International Conference on Machine Learning (ICML), 2018

The ADAM optimizer is exceedingly popular in the deep learning community. Often it works very well, sometimes it doesn't. Why? We interpret ADAM as a combination of two aspects: for each weight, the update direction is determined by the sign of stochastic gradients, whereas the update magnitude is determined by an estimate of their relative variance. We disentangle these two aspects and analyze them in isolation, gaining insight into the mechanisms underlying ADAM. This analysis also extends recent results on adverse effects of ADAM on generalization, isolating the sign aspect as the problematic one. Transferring the variance adaptation to SGD gives rise to a novel method, completing the practitioner's toolbox for problems where ADAM fails.

翻译：ADAM 优化器在深层学习界非常受欢迎, 通常效果很好, 有时是不行的。为什么? 我们把ADAM 解释成是两个方面的结合: 每个重量, 更新的方向由随机梯度的标记决定, 而更新的程度则由其相对差异的估计决定。我们分解这两个方面, 孤立地分析它们, 了解ADAM 背后的机制。这项分析还扩展了ADAM 对一般化的不利影响的最新结果, 将标志部分与问题部分隔离开来。将差异调整转换到 SGD 产生一种新颖的方法, 完成对ADAM 失败问题的操作工具箱。

0

相关内容

Adam

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

44+阅读 · 2019年10月28日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

已删除

将门创投

4+阅读 · 2019年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Improving Sampling Accuracy of Stochastic Gradient MCMC Methods via Non-uniform Subsampling of Gradients

Arxiv

0+阅读 · 2021年2月16日

Approximating Two-Stage Stochastic Supplier Problems

Arxiv

0+阅读 · 2021年2月16日

Emulation of stochastic simulators using generalized lambda models

Arxiv

0+阅读 · 2021年2月16日

Projection-Free Adaptive Gradients for Large-Scale Optimization

Arxiv

0+阅读 · 2021年2月15日

Langevin Monte Carlo: random coordinate descent and variance reduction

Arxiv

0+阅读 · 2021年2月12日

Leveraging Global Parameters for Flow-based Neural Posterior Estimation

Arxiv

0+阅读 · 2021年2月12日

Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年2月12日

Stragglers Are Not Disaster: A Hybrid Federated Learning Algorithm with Delayed Gradients

Arxiv

0+阅读 · 2021年2月12日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Arxiv

4+阅读 · 2018年4月9日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

44+阅读 · 2019年10月28日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML2025】用于可扩展持续强化学习的自组合策略

图结构遇上智能体：分类方法、研究进展与未来机遇

2024年军事智能领域科技发展综述

【HKUST博士论文】知识图谱推理的进展：复杂查询应答与逻辑假设生成的创新方法

相关资讯

已删除

将门创投

4+阅读 · 2019年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Improving Sampling Accuracy of Stochastic Gradient MCMC Methods via Non-uniform Subsampling of Gradients

Arxiv

0+阅读 · 2021年2月16日

Approximating Two-Stage Stochastic Supplier Problems

Arxiv

0+阅读 · 2021年2月16日

Emulation of stochastic simulators using generalized lambda models

Arxiv

0+阅读 · 2021年2月16日

Projection-Free Adaptive Gradients for Large-Scale Optimization

Arxiv

0+阅读 · 2021年2月15日

Langevin Monte Carlo: random coordinate descent and variance reduction

Arxiv

0+阅读 · 2021年2月12日

Leveraging Global Parameters for Flow-based Neural Posterior Estimation

Arxiv

0+阅读 · 2021年2月12日

Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年2月12日

Stragglers Are Not Disaster: A Hybrid Federated Learning Algorithm with Delayed Gradients

Arxiv

0+阅读 · 2021年2月12日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Arxiv

4+阅读 · 2018年4月9日

微信扫码咨询专知VIP会员