Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders - 专知论文

会员服务 ·

0

SGD · Shuffle · Analysis · Better · Performer ·

2023 年 5 月 30 日

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders

翻译：暂无翻译

Anastasia Koloskova,Nikita Doikov,Sebastian U. Stich,Martin Jaggi

Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being popular choices for cycling through random or single permutations of the training data. However, the convergence properties of these algorithms in the non-convex case are not fully understood. Existing results suggest that, in realistic training scenarios where the number of epochs is smaller than the training set size, RR may perform worse than SGD. In this paper, we analyze a general SGD algorithm that allows for arbitrary data orderings and show improved convergence rates for non-convex functions. Specifically, our analysis reveals that SGD with random and single shuffling is always faster or at least as good as classical SGD with replacement, regardless of the number of iterations. Overall, our study highlights the benefits of using SGD with random/single shuffling and provides new insights into its convergence properties for non-convex optimization.

翻译：暂无翻译

0

相关内容

SGD

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

切换系统的容错保成本和容错H无穷控制

国家自然科学基金

0+阅读 · 2015年12月31日

局部波动特征分解(LOD)方法及其在机械故障诊断中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向智能电网基础设施Cyber-Physical安全的自治愈基础理论研究

国家自然科学基金

1+阅读 · 2013年12月31日

二氯乙酸通过抑制自噬减轻心肌缺血/再灌注损伤的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

条件性剔除JWA基因引起微环境改变对肿瘤生长和转移的影响和机制

国家自然科学基金

0+阅读 · 2012年12月31日

互连网络结构性质及优化设计研究

国家自然科学基金

1+阅读 · 2012年12月31日

番茄抗病膜蛋白TARK1稳定性的调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

地球流体力学和物理学中一些非线性偏微分方程研究

国家自然科学基金

0+阅读 · 2011年12月31日

miRNA-132逆境应答诱导海马突触可塑性异常在抑郁症中的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

益气活血方通过"DAMPs-PRRs-巨噬细胞"途径影响动脉粥样硬化斑块易损性的机制

国家自然科学基金

0+阅读 · 2011年12月31日

Continuous-time multivariate analysis

Arxiv

0+阅读 · 2023年7月18日

Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm

Arxiv

0+阅读 · 2023年7月18日

A Bayesian Framework for Multivariate Differential Analysis accounting for Missing Data

Arxiv

0+阅读 · 2023年7月18日

The Predicted-Deletion Dynamic Model: Taking Advantage of ML Predictions, for Free

Arxiv

0+阅读 · 2023年7月17日

Universal Online Learning with Gradual Variations: A Multi-layer Online Ensemble Approach

Arxiv

0+阅读 · 2023年7月17日

A randomization-based theory for preliminary testing of covariate balance in controlled trials

Arxiv

0+阅读 · 2023年7月17日

SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks

Arxiv

0+阅读 · 2023年7月15日

Comparing Scale Parameter Estimators for Gaussian Process Regression: Cross Validation and Maximum Likelihood

Arxiv

0+阅读 · 2023年7月14日

Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds through Algorithmic Stability

Arxiv

0+阅读 · 2023年7月14日

How to perform modeling with independent and preferential data jointly?

Arxiv

0+阅读 · 2023年7月13日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的检索与结构化增强生成综述

《实现多层防御多轮交战机制的扩展型随机齐射模型》2025年最新83页

【CMU博士论文】交互驱动的人体动作估计与生成

如何避免生成式人工智能在作战中失控失效

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Continuous-time multivariate analysis

Arxiv

0+阅读 · 2023年7月18日

Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm

Arxiv

0+阅读 · 2023年7月18日

A Bayesian Framework for Multivariate Differential Analysis accounting for Missing Data

Arxiv

0+阅读 · 2023年7月18日

The Predicted-Deletion Dynamic Model: Taking Advantage of ML Predictions, for Free

Arxiv

0+阅读 · 2023年7月17日

Universal Online Learning with Gradual Variations: A Multi-layer Online Ensemble Approach

Arxiv

0+阅读 · 2023年7月17日

A randomization-based theory for preliminary testing of covariate balance in controlled trials

Arxiv

0+阅读 · 2023年7月17日

SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks

Arxiv

0+阅读 · 2023年7月15日

Comparing Scale Parameter Estimators for Gaussian Process Regression: Cross Validation and Maximum Likelihood

Arxiv

0+阅读 · 2023年7月14日

Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds through Algorithmic Stability

Arxiv

0+阅读 · 2023年7月14日

How to perform modeling with independent and preferential data jointly?

Arxiv

0+阅读 · 2023年7月13日

相关基金

切换系统的容错保成本和容错H无穷控制

国家自然科学基金

0+阅读 · 2015年12月31日

局部波动特征分解(LOD)方法及其在机械故障诊断中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向智能电网基础设施Cyber-Physical安全的自治愈基础理论研究

国家自然科学基金

1+阅读 · 2013年12月31日

二氯乙酸通过抑制自噬减轻心肌缺血/再灌注损伤的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

条件性剔除JWA基因引起微环境改变对肿瘤生长和转移的影响和机制

国家自然科学基金

0+阅读 · 2012年12月31日

互连网络结构性质及优化设计研究

国家自然科学基金

1+阅读 · 2012年12月31日

番茄抗病膜蛋白TARK1稳定性的调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

地球流体力学和物理学中一些非线性偏微分方程研究

国家自然科学基金

0+阅读 · 2011年12月31日

miRNA-132逆境应答诱导海马突触可塑性异常在抑郁症中的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

益气活血方通过"DAMPs-PRRs-巨噬细胞"途径影响动脉粥样硬化斑块易损性的机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员