关于超参数性线性倒退适应方法的一般化 (On Generalization of Adaptive Methods for Over-parameterized Linear Regression) - 专知论文

会员服务 ·

0

泛化理论 · 线性回归 · 线性的 · Neural Networks · 极小点 ·

2020 年 11 月 28 日

On Generalization of Adaptive Methods for Over-parameterized Linear Regression

翻译：关于超参数性线性倒退适应方法的一般化

Vatsal Shah,Soumya Basu,Anastasios Kyrillidis,Sujay Sanghavi

from arxiv, arXiv admin note: substantial text overlap with arXiv:1811.07055

Over-parameterization and adaptive methods have played a crucial role in the success of deep learning in the last decade. The widespread use of over-parameterization has forced us to rethink generalization by bringing forth new phenomena, such as implicit regularization of optimization algorithms and double descent with training progression. A series of recent works have started to shed light on these areas in the quest to understand -- why do neural networks generalize well? The setting of over-parameterized linear regression has provided key insights into understanding this mysterious behavior of neural networks. In this paper, we aim to characterize the performance of adaptive methods in the over-parameterized linear regression setting. First, we focus on two sub-classes of adaptive methods depending on their generalization performance. For the first class of adaptive methods, the parameter vector remains in the span of the data and converges to the minimum norm solution like gradient descent (GD). On the other hand, for the second class of adaptive methods, the gradient rotation caused by the pre-conditioner matrix results in an in-span component of the parameter vector that converges to the minimum norm solution and the out-of-span component that saturates. Our experiments on over-parameterized linear regression and deep neural networks support this theory.

翻译：在过去十年中,过度参数化和适应性方法在深层学习的成功中发挥了关键作用。过度参数化的广泛使用迫使我们重新思考一般化,提出新的现象,例如优化算法的隐性正规化和与培训进展的双向下降。最近的一系列工作开始揭示这些领域,以寻求理解 -- -- 为什么神经网络能够很好地普及?过分参数化线性回归的设置为理解神经网络的这种神秘行为提供了关键的洞察力。在本文中,我们的目标是说明过度参数化线性回归设置中适应方法的性能。首先,我们注重适应方法的两个亚类,视其一般化性能而定。关于第一组适应方法,参数矢量处在数据范围内,与最起码的规范解决方案如梯度下降(GD)相融合。另一方面,对于第二类适应方法,由预参数矩阵引起的梯度旋转导致的参数矢量在泛星部分中的结果,该参数矢量值与最低限度标准解决方案相融合,外线性线性回归系统支持这一深层线性理论。

0

相关内容

泛化理论

【AAAI2021】层次图胶囊网络

【AAAI2021】层次图胶囊网络

专知会员服务

84+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Consistent Risk Estimation in Moderately High-Dimensional Linear Regression

Arxiv

0+阅读 · 2021年1月18日

Local Search Algorithms for Rank-Constrained Convex Optimization

Arxiv

0+阅读 · 2021年1月15日

Random and quasi-random designs in group testing

Random and quasi-random designs in group testing

Arxiv

0+阅读 · 2021年1月15日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

A Survey on The Expressive Power of Graph Neural Networks

A Survey on The Expressive Power of Graph Neural Networks

Arxiv

7+阅读 · 2020年3月9日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【AAAI2021】层次图胶囊网络

【AAAI2021】层次图胶囊网络

专知会员服务

84+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

俄乌战争：无人机作战

《人类就绪水平在人工智能密集型系统中的应用》最新文献

《迈向全自主超轻型无人机》2025最新124页

《自动化战略情报治理》最新文献

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Consistent Risk Estimation in Moderately High-Dimensional Linear Regression

Arxiv

0+阅读 · 2021年1月18日

Local Search Algorithms for Rank-Constrained Convex Optimization

Arxiv

0+阅读 · 2021年1月15日

Random and quasi-random designs in group testing

Random and quasi-random designs in group testing

Arxiv

0+阅读 · 2021年1月15日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

A Survey on The Expressive Power of Graph Neural Networks

A Survey on The Expressive Power of Graph Neural Networks

Arxiv

7+阅读 · 2020年3月9日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员