普遍镜底与时间依赖镜面的线性相融合 (Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors) - 专知论文

会员服务 ·

0

线性的 · AdaGrad · 优化器 · 随机梯度下降 · 泛化理论 ·

2021 年 10 月 6 日

Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors

翻译：普遍镜底与时间依赖镜面的线性相融合

Adityanarayanan Radhakrishnan,Mikhail Belkin,Caroline Uhler

The Polyak-Lojasiewicz (PL) inequality is a sufficient condition for establishing linear convergence of gradient descent, even in non-convex settings. While several recent works use a PL-based analysis to establish linear convergence of stochastic gradient descent methods, the question remains as to whether a similar analysis can be conducted for more general optimization methods. In this work, we present a PL-based analysis for linear convergence of generalized mirror descent (GMD), a generalization of mirror descent with a possibly time-dependent mirror. GMD subsumes popular first order optimization methods including gradient descent, mirror descent, and preconditioned gradient descent methods such as Adagrad. Since the standard PL analysis cannot be extended naturally from GMD to stochastic GMD, we present a Taylor-series based analysis to establish sufficient conditions for linear convergence of stochastic GMD. As a corollary, our result establishes sufficient conditions and provides learning rates for linear convergence of stochastic mirror descent and Adagrad. Lastly, for functions that are locally PL*, our analysis implies existence of an interpolating solution and convergence of GMD to this solution.

翻译：Polyak-Lojasiewicz(PL)不平等是确定梯度下降线性趋同的充足条件,即使在非混凝土环境中也是如此。虽然最近的一些工作利用基于PL的分析来确定随机梯度下降方法的线性趋同,但问题仍然是,是否可以为更普遍的优化方法进行类似的分析。在这项工作中,我们提出了基于PL的分析,以确定普遍镜面下降线性趋同(GMD),将镜面下降法普遍化,并视时间而定。GMD子集成流行的第一顺序优化方法,包括梯度下降、镜位下降和Adagrad等有先决条件的梯度下降方法。由于标准的PL分析不能自然地从GMD扩大到Stochacistic GD,我们提出了基于泰勒系列的分析,以便为随机镜面下降的线性融合创造充分的条件。作为推论,我们的结果为随机镜系和Adagrad的线性融合提供了足够的条件和学习率。最后,对于当地PL* 的功能,我们的分析意味着存在一种内插解决办法和GMD的趋同这一解决办法。

0

相关内容

线性的

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

专知会员服务

14+阅读 · 2019年11月22日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

已删除

将门创投

5+阅读 · 2019年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Trust the Critics: Generatorless and Multipurpose WGANs with Initial Convergence Guarantees

Arxiv

0+阅读 · 2021年11月30日

On the convergence of Broyden's method and some accelerated schemes for singular problems

Arxiv

0+阅读 · 2021年11月29日

Convergence Analysis For Non Linear System Of Parabolic Variational Inequalities

Arxiv

0+阅读 · 2021年11月28日

Recent Theoretical Advances in Non-Convex Optimization

Arxiv

0+阅读 · 2021年11月26日

Randomized Stochastic Gradient Descent Ascent

Arxiv

0+阅读 · 2021年11月25日

A Simple Optimal Contention Resolution Scheme for Uniform Matroids

Arxiv

0+阅读 · 2021年11月25日

On the Estimation of Information Measures of Continuous Distributions

Arxiv

0+阅读 · 2021年11月24日

Explaining generalization in deep learning: progress and fundamental limits

Arxiv

10+阅读 · 2021年10月17日

Co-Generation with GANs using AIS based HMC

Arxiv

3+阅读 · 2019年10月31日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

VIP会员

文章信息

相关主题

随机梯度下降

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

专知会员服务

14+阅读 · 2019年11月22日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

已删除

将门创投

5+阅读 · 2019年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Trust the Critics: Generatorless and Multipurpose WGANs with Initial Convergence Guarantees

Arxiv

0+阅读 · 2021年11月30日

On the convergence of Broyden's method and some accelerated schemes for singular problems

Arxiv

0+阅读 · 2021年11月29日

Convergence Analysis For Non Linear System Of Parabolic Variational Inequalities

Arxiv

0+阅读 · 2021年11月28日

Recent Theoretical Advances in Non-Convex Optimization

Arxiv

0+阅读 · 2021年11月26日

Randomized Stochastic Gradient Descent Ascent

Arxiv

0+阅读 · 2021年11月25日

A Simple Optimal Contention Resolution Scheme for Uniform Matroids

Arxiv

0+阅读 · 2021年11月25日

On the Estimation of Information Measures of Continuous Distributions

Arxiv

0+阅读 · 2021年11月24日

Explaining generalization in deep learning: progress and fundamental limits

Arxiv

10+阅读 · 2021年10月17日

Co-Generation with GANs using AIS based HMC

Arxiv

3+阅读 · 2019年10月31日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

微信扫码咨询专知VIP会员