超参数超参数神经网络适应性梯度方法全球趋同 (Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network)

Adaptive gradient methods like AdaGrad are widely used in optimizing neural networks. Yet, existing convergence guarantees for adaptive gradient methods require either convexity or smoothness, and, in the smooth setting, only guarantee convergence to a stationary point. We propose an adaptive gradient method and show that for two-layer over-parameterized neural networks -- if the width is sufficiently large (polynomially) -- then the proposed method converges \emph{to the global minimum} in polynomial time, and convergence is robust, \emph{ without the need to fine-tune hyper-parameters such as the step-size schedule and with the level of over-parametrization independent of the training error}. Our analysis indicates in particular that over-parametrization is crucial for the harnessing the full potential of adaptive gradient methods in the setting of neural networks.

翻译：AdaGrad 等适应性梯度方法被广泛用于优化神经网络。然而,适应性梯度方法的现有趋同保证要求稳健或平滑,在平滑的环境下,只能保证向固定点趋同。我们建议了适应性梯度方法,并表明,对于双层超分度神经网络而言,如果宽度足够大(多角度的),则拟议的方法在多元时间里会与全球最小值相融合,而趋同是稳健的,不需要微调超参数,如分级表和与培训错误无关的超平衡水平。我们的分析特别表明,超分化对于在神经网络设置中充分利用适应性梯度方法的潜力至关重要。

相关内容

Neural Networks

关注 0

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【ICML2020】持续图神经网络，Continuous Graph Neural Networks

专知会员服务

146+阅读 · 2020年6月28日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

13+阅读 · 2020年5月19日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

14+阅读 · 2020年1月13日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

22+阅读 · 2019年11月21日