ZeroSARAH: 高效非电流非有限-小型优化,零全梯度计算法 (ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation) - 专知论文

会员服务 ·

0

全 · 非凸 · 周期的 · 优化器 · ONCE ·

2021 年 8 月 23 日

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

翻译：ZeroSARAH: 高效非电流非有限-小型优化,零全梯度计算法

Zhize Li,Slavomír Hanzely,Peter Richtárik

from arxiv, 28 pages

We propose ZeroSARAH -- a novel variant of the variance-reduced method SARAH (Nguyen et al., 2017) -- for minimizing the average of a large number of nonconvex functions $\frac{1}{n}\sum_{i=1}^{n}f_i(x)$. To the best of our knowledge, in this nonconvex finite-sum regime, all existing variance-reduced methods, including SARAH, SVRG, SAGA and their variants, need to compute the full gradient over all $n$ data samples at the initial point $x^0$, and then periodically compute the full gradient once every few iterations (for SVRG, SARAH and their variants). Note that SVRG, SAGA and their variants typically achieve weaker convergence results than variants of SARAH: $n^{2/3}/\epsilon^2$ vs. $n^{1/2}/\epsilon^2$. Thus we focus on the variant of SARAH. The proposed ZeroSARAH and its distributed variant D-ZeroSARAH are the \emph{first} variance-reduced algorithms which \emph{do not require any full gradient computations}, not even for the initial point. Moreover, for both standard and distributed settings, we show that ZeroSARAH and D-ZeroSARAH obtain new state-of-the-art convergence results, which can improve the previous best-known result (given by e.g., SPIDER, SARAH, and PAGE) in certain regimes. Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large. Especially in the distributed setting, periodic computation of full gradient over all data samples needs to periodically synchronize all clients/devices/machines, which may be impossible or unaffordable. Thus, we expect that ZeroSARAH/D-ZeroSARAH will have a practical impact in distributed and federated learning where full device participation is impractical.

翻译：我们建议ZeroSARAH -- -- 零SARAH -- -- 差异降价方法的一种新变体 -- -- 在初始点计算所有美元数据样本的完全梯度(Nguyen等人,2017年) -- -- 以最大限度地降低大量非convex函数的平均值$frac{1 ⁇ n ⁇ sum ⁇ i=1 ⁇ n}f_i(x)美元。据我们所知,在非conxlex-SARAH制度下,所有现有的差异降价方法,包括SARAH、SVRG、SAGA及其变方通常会比SARAH的变体($%2/3}/\cepsilon$2 vs. $n_1/2}/cepsilon=2$。因此,我们关注SARAH的变价(美元=0美元),然后定期计算完全的零SARAHA及其变价计算结果,在初始点上, ASAHAHARC 正常的变价值。

0

相关内容

【ICML2021】随机傅立叶特征的量化算法

专知会员服务

25+阅读 · 2021年7月31日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【干货书】鲁棒优化Robust Optimization，570页pdf

专知会员服务

144+阅读 · 2021年3月17日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

8+阅读 · 2019年3月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization

Arxiv

0+阅读 · 2021年10月15日

Newton-MR: Inexact Newton Method With Minimum Residual Sub-problem Solver

Newton-MR: Inexact Newton Method With Minimum Residual Sub-problem Solver

Arxiv

0+阅读 · 2021年10月15日

Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent

Arxiv

0+阅读 · 2021年10月15日

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Arxiv

0+阅读 · 2021年10月11日

Efficient computation of the zeros of the Bargmann transform under additive white noise

Arxiv

0+阅读 · 2021年8月24日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

20+阅读 · 2021年5月10日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

8+阅读 · 2019年5月20日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】随机傅立叶特征的量化算法

专知会员服务

25+阅读 · 2021年7月31日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【干货书】鲁棒优化Robust Optimization，570页pdf

专知会员服务

144+阅读 · 2021年3月17日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

8+阅读 · 2019年3月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization

Arxiv

0+阅读 · 2021年10月15日

Newton-MR: Inexact Newton Method With Minimum Residual Sub-problem Solver

Newton-MR: Inexact Newton Method With Minimum Residual Sub-problem Solver

Arxiv

0+阅读 · 2021年10月15日

Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent

Arxiv

0+阅读 · 2021年10月15日

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Arxiv

0+阅读 · 2021年10月11日

Efficient computation of the zeros of the Bargmann transform under additive white noise

Arxiv

0+阅读 · 2021年8月24日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

20+阅读 · 2021年5月10日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

8+阅读 · 2019年5月20日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员