迈向多立方制步数梯度梯级后裔的统计和计算复杂程度 (Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent) - 专知论文

会员服务 ·

0

CC · 统计量 · 损失函数（机器学习） · 泛函 · 经验损失 ·

2021 年 10 月 15 日

Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent

翻译：迈向多立方制步数梯度梯级后裔的统计和计算复杂程度

Tongzheng Ren,Fuheng Cui,Alexia Atsidakou,Sujay Sanghavi,Nhat Ho

from arxiv, First three authors contributed equally. 40 pages, 4 figures

We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growth on the concentration bound between the gradients of sample and population loss functions. We demonstrate that the Polyak step size gradient descent iterates reach a final statistical radius of convergence around the true parameter after logarithmic number of iterations in terms of the sample size. It is computationally cheaper than the polynomial number of iterations on the sample size of the fixed-step size gradient descent algorithm to reach the same final statistical radius when the population loss function is not locally strongly convex. Finally, we illustrate our general theory under three statistical examples: generalized linear model, mixture model, and mixed linear regression model.

翻译：我们研究了在人口损失功能普遍平滑和Lojasiewicz条件下Polyak 梯度梯度梯度下降算法的统计和计算复杂性,即当抽样规模达到无限度时经验损失函数的限度,以及经验性和人口损失函数梯度之间的稳定性,即关于抽样梯度与人口损失函数之间集中的多角度增长。我们证明,Polyak 梯度梯度梯度下降值在抽样规模的迭代数对数之后,在真实参数周围达到最后的统计半径。在人口损失函数不具有很强的本地共性时,计算比固定梯度梯度梯度梯度下降算法抽样规模的多数值要便宜,以达到相同的最终统计半径。最后,我们用三个统计例子(通用线性模型、混合模型和混合线性回归模型)来说明我们的一般理论。

0

相关内容

CC在计算复杂性方面表现突出。它的学科处于数学与计算机理论科学的交叉点，具有清晰的数学轮廓和严格的数学格式。官网链接：https://link.springer.com/journal/37

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【MLSS2020】最新《贝叶斯推断》教程，125页ppt与视频，DeepMind Shakir Mohamed博士

【MLSS2020】最新《贝叶斯推断》教程，125页ppt与视频，DeepMind Shakir Mohamed博士

专知会员服务

119+阅读 · 2020年7月11日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

回顾目标检测中的Anchor机制

回顾目标检测中的Anchor机制

极市平台

8+阅读 · 2020年10月14日

tf.GradientTape 详解

tf.GradientTape 详解

TensorFlow

120+阅读 · 2020年2月21日

目标检测中的Consistent Optimization

目标检测中的Consistent Optimization

极市平台

6+阅读 · 2019年4月23日

目标检测算法优化技巧：Bag of Freebies for Training Object Detection

目标检测算法优化技巧：Bag of Freebies for Training Object Detection

极市平台

6+阅读 · 2019年3月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Soft-NMS – Improving Object Detection With One Line of Code

Soft-NMS – Improving Object Detection With One Line of Code

统计学习与视觉计算组

6+阅读 · 2018年3月30日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Consistency of Spectral Seriation

Arxiv

0+阅读 · 2021年12月8日

A robust fusion-extraction procedure with summary statistics in the presence of biased sources

Arxiv

0+阅读 · 2021年12月8日

Solution manifold and Its Statistical Applications

Arxiv

0+阅读 · 2021年12月7日

On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

Arxiv

0+阅读 · 2021年12月5日

Transform orders and stochastic monotonicity of statistical functionals

Arxiv

0+阅读 · 2021年12月4日

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Arxiv

0+阅读 · 2021年12月3日

Empirical phi-divergence test statistics in the logistic regression model

Arxiv

0+阅读 · 2021年12月2日

Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond

Arxiv

5+阅读 · 2021年10月1日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

VIP会员

文章信息

相关主题

损失函数（机器学习）

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【MLSS2020】最新《贝叶斯推断》教程，125页ppt与视频，DeepMind Shakir Mohamed博士

【MLSS2020】最新《贝叶斯推断》教程，125页ppt与视频，DeepMind Shakir Mohamed博士

专知会员服务

119+阅读 · 2020年7月11日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】利用强化学习与生成模型推进可靠且可泛化的决策

美海军研发“增强侦察与态势评估系统（ARES）”应用程序以优化作战规划（附研究论文）

【NeurIPS2025】DNA-DetectLLM：基于 DNA 启发的“突变-修复”范式揭示 AI 生成文本

面向深度研究系统的强化学习基础：综述

相关资讯

回顾目标检测中的Anchor机制

回顾目标检测中的Anchor机制

极市平台

8+阅读 · 2020年10月14日

tf.GradientTape 详解

tf.GradientTape 详解

TensorFlow

120+阅读 · 2020年2月21日

目标检测中的Consistent Optimization

目标检测中的Consistent Optimization

极市平台

6+阅读 · 2019年4月23日

目标检测算法优化技巧：Bag of Freebies for Training Object Detection

目标检测算法优化技巧：Bag of Freebies for Training Object Detection

极市平台

6+阅读 · 2019年3月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Soft-NMS – Improving Object Detection With One Line of Code

Soft-NMS – Improving Object Detection With One Line of Code

统计学习与视觉计算组

6+阅读 · 2018年3月30日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Consistency of Spectral Seriation

Arxiv

0+阅读 · 2021年12月8日

A robust fusion-extraction procedure with summary statistics in the presence of biased sources

Arxiv

0+阅读 · 2021年12月8日

Solution manifold and Its Statistical Applications

Arxiv

0+阅读 · 2021年12月7日

On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

Arxiv

0+阅读 · 2021年12月5日

Transform orders and stochastic monotonicity of statistical functionals

Arxiv

0+阅读 · 2021年12月4日

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Arxiv

0+阅读 · 2021年12月3日

Empirical phi-divergence test statistics in the logistic regression model

Arxiv

0+阅读 · 2021年12月2日

Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond

Arxiv

5+阅读 · 2021年10月1日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

微信扫码咨询专知VIP会员