以蒸汽逐渐优化为主的适应性 (On the Adaptivity of Stochastic Gradient-Based Optimization) - 专知论文

会员服务 ·

0

优化器 · 方差减小 · 模型评估 · 数学优化 · Better ·

2020 年 12 月 31 日

On the Adaptivity of Stochastic Gradient-Based Optimization

翻译：以蒸汽逐渐优化为主的适应性

Lihua Lei,Michael I. Jordan

from arxiv, Accepted by SIAM Journal on Optimization; 54 pages

Stochastic-gradient-based optimization has been a core enabling methodology in applications to large-scale problems in machine learning and related areas. Despite the progress, the gap between theory and practice remains significant, with theoreticians pursuing mathematical optimality at a cost of obtaining specialized procedures in different regimes (e.g., modulus of strong convexity, magnitude of target accuracy, signal-to-noise ratio), and with practitioners not readily able to know which regime is appropriate to their problem, and seeking broadly applicable algorithms that are reasonably close to optimality. To bridge these perspectives it is necessary to study algorithms that are adaptive to different regimes. We present the stochastically controlled stochastic gradient (SCSG) method for composite convex finite-sum optimization problems and show that SCSG is adaptive to both strong convexity and target accuracy. The adaptivity is achieved by batch variance reduction with adaptive batch sizes and a novel technique, which we referred to as geometrization, which sets the length of each epoch as a geometric random variable. The algorithm achieves strictly better theoretical complexity than other existing adaptive algorithms, while the tuning parameters of the algorithm only depend on the smoothness parameter of the objective.

翻译：尽管取得了这些进展,但理论和实践之间的差距仍然很大,理论学家追求数学的最佳性,其代价是在不同制度中获得专门程序(例如,高度相近的模量、目标精确度、信号到噪音比率),以及从业人员无法随时了解哪种制度适合他们的问题,并寻求合理接近于最佳性的可广泛应用的算法。为了弥合这些观点,有必要研究适应不同制度的算法。我们提出了综合convex有限和最优化问题的随机控制梯度方法(SCSG),并表明SCSG既适应强的相近性,又适应目标准确性。适应性是通过分批量尺寸的分批变异和我们称之为几何调的新技术来实现的。我们称之为几何相近的算法,将每种偏差的长度定成一个几何随机变量。算法只能在理论上比其他现有适应性算法更复杂得多的参数上实现精确的理论复杂性,同时调整了现有平稳的参数。

0

相关内容

优化器

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

57+阅读 · 2020年11月21日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Kernel-Based Models for Influence Maximization on Graphs based on Gaussian Process Variance Minimization

Arxiv

0+阅读 · 2021年3月2日

Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry

Arxiv

0+阅读 · 2021年3月2日

Non-Euclidean Differentially Private Stochastic Convex Optimization

Arxiv

0+阅读 · 2021年3月1日

Moment-Based Variational Inference for Stochastic Differential Equations

Arxiv

0+阅读 · 2021年3月1日

On Surrogate Learning for Linear Stability Assessment of Navier-Stokes Equations with Stochastic Viscosity

Arxiv

0+阅读 · 2021年2月28日

Corralling Stochastic Bandit Algorithms

Arxiv

0+阅读 · 2021年2月28日

Online DR-Submodular Maximization with Stochastic Cumulative Constraints

Arxiv

0+阅读 · 2021年2月28日

Kernel Distributionally Robust Optimization

Arxiv

0+阅读 · 2021年2月27日

On the Generalization of Stochastic Gradient Descent with Momentum

Arxiv

0+阅读 · 2021年2月26日

Machine Unlearning via Algorithmic Stability

Arxiv

0+阅读 · 2021年2月25日

VIP会员

文章信息

相关主题

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

57+阅读 · 2020年11月21日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Kernel-Based Models for Influence Maximization on Graphs based on Gaussian Process Variance Minimization

Arxiv

0+阅读 · 2021年3月2日

Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry

Arxiv

0+阅读 · 2021年3月2日

Non-Euclidean Differentially Private Stochastic Convex Optimization

Arxiv

0+阅读 · 2021年3月1日

Moment-Based Variational Inference for Stochastic Differential Equations

Arxiv

0+阅读 · 2021年3月1日

On Surrogate Learning for Linear Stability Assessment of Navier-Stokes Equations with Stochastic Viscosity

Arxiv

0+阅读 · 2021年2月28日

Corralling Stochastic Bandit Algorithms

Arxiv

0+阅读 · 2021年2月28日

Online DR-Submodular Maximization with Stochastic Cumulative Constraints

Arxiv

0+阅读 · 2021年2月28日

Kernel Distributionally Robust Optimization

Arxiv

0+阅读 · 2021年2月27日

On the Generalization of Stochastic Gradient Descent with Momentum

Arxiv

0+阅读 · 2021年2月26日

Machine Unlearning via Algorithmic Stability

Arxiv

0+阅读 · 2021年2月25日

微信扫码咨询专知VIP会员