具有机器学习类型噪音的存储梯度下降。第一部分:分立时间分析 (Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis) - 专知论文

会员服务 ·

0

随机梯度下降 · 噪声 · Machine Learning · 目标函数 · SGD ·

2021 年 5 月 4 日

Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis

翻译：具有机器学习类型噪音的存储梯度下降。第一部分:分立时间分析

Stephan Wojtowytsch

Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine learning. The noise encountered in these applications is different from that in many theoretical analyses of stochastic gradient algorithms. In this article, we discuss some of the common properties of energy landscapes and stochastic noise encountered in machine learning problems, and how they affect SGD-based optimization. In particular, we show that the learning rate in SGD with machine learning noise can be chosen to be small, but uniformly positive for all times if the energy landscape resembles that of overparametrized deep learning problems. If the objective function satisfies a Lojasiewicz inequality, SGD converges to the global minimum exponentially fast, and even for functions which may have local minima, we establish almost sure convergence to the global minimum at an exponential rate from any finite energy initialization. The assumptions that we make in this result concern the behavior where the objective function is either small or large and the nature of the gradient noise, but the energy landscape is fairly unconstrained on the domain where the objective function takes values in an intermediate regime.

翻译：在现代机器学习中最受欢迎的算法之一。这些应用中遇到的噪音与许多随机梯度算法的理论分析不同。在本篇文章中,我们讨论了在机器学习问题中遇到的能源景观和随机噪声的一些共同特性,以及它们如何影响基于SGD的优化。特别是,我们表明,SGD中机器学习噪音的学习率可以被选为小的,但如果能源景观类似于过度平衡的深层学习问题,则在任何时候都具有统一的积极性。如果客观功能满足了Lojasiewicz的不平等,SGD会快速地聚集到全球最低值,甚至对于可能具有本地微量值的功能,我们几乎可以确定与全球最低值的趋同程度,从任何有限的能源初始化中以指数速度计算。我们由此得出的假设涉及目标功能大小和易变音性质的行为,但是在客观函数在中间系统中占据价值的领域,能源景观相当松散。

0

相关内容

随机梯度下降

随机梯度下降

随机梯度下降，按照数据生成分布抽取m个样本，通过计算他们梯度的平均值来更新梯度。

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

图机器学习导论，69页ppt，An introduction to machine learning on graphs

图机器学习导论，69页ppt，An introduction to machine learning on graphs

专知会员服务

383+阅读 · 2019年12月27日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

已删除

将门创投

4+阅读 · 2020年6月12日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Sparse recovery by reduced variance stochastic approximation

Arxiv

0+阅读 · 2021年6月28日

Distributed Zero-Order Optimization under Adversarial Noise

Distributed Zero-Order Optimization under Adversarial Noise

Arxiv

0+阅读 · 2021年6月28日

Score-Based Change Detection for Gradient-Based Learning Machines

Arxiv

0+阅读 · 2021年6月27日

Differentially Private SGD with Non-Smooth Losses

Arxiv

0+阅读 · 2021年6月26日

Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent

Arxiv

0+阅读 · 2021年6月25日

Learning a Probabilistic Relaxation of Discrete Variables for Soft Detection with Low Complexity: CMDNet

Arxiv

0+阅读 · 2021年6月25日

Hessian informed mirror descent

Arxiv

0+阅读 · 2021年6月25日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

VIP会员

文章信息

相关主题

随机梯度下降

Machine Learning

相关VIP内容

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

图机器学习导论，69页ppt，An introduction to machine learning on graphs

图机器学习导论，69页ppt，An introduction to machine learning on graphs

专知会员服务

383+阅读 · 2019年12月27日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

智能书（SmartBook）：面向情报分析师的AI辅助态势报告生成工具 | 附文献

《战伤医疗训练：结合实体与数字资产的轻量化模拟器概念原型设计与评估》66页

《知识增强型大语言模型及面向创造力支持的人机协作框架》233页

《马赛克战：空间赋能杀伤网高级分析（AASK）》2025最新文献

相关资讯

已删除

将门创投

4+阅读 · 2020年6月12日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Sparse recovery by reduced variance stochastic approximation

Arxiv

0+阅读 · 2021年6月28日

Distributed Zero-Order Optimization under Adversarial Noise

Distributed Zero-Order Optimization under Adversarial Noise

Arxiv

0+阅读 · 2021年6月28日

Score-Based Change Detection for Gradient-Based Learning Machines

Arxiv

0+阅读 · 2021年6月27日

Differentially Private SGD with Non-Smooth Losses

Arxiv

0+阅读 · 2021年6月26日

Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent

Arxiv

0+阅读 · 2021年6月25日

Learning a Probabilistic Relaxation of Discrete Variables for Soft Detection with Low Complexity: CMDNet

Arxiv

0+阅读 · 2021年6月25日

Hessian informed mirror descent

Arxiv

0+阅读 · 2021年6月25日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

微信扫码咨询专知VIP会员