加速过度平衡学习的斯托卡培训 (Accelerating Stochastic Training for Over-parametrized Learning)

We introduce MaSS (Momentum-added Stochastic Solver), an accelerated SGD method for optimizing over-parametrized models. Our method is simple and efficient to implement and does not require adapting hyper-parameters or computing full gradients in the course of optimization. Experimental evaluation of MaSS for several standard architectures of deep networks, including ResNet and convolutional networks, shows improved performance over Adam and SGD both in optimization and generalization. We prove accelerated convergence of MaSS over SGD and provide analysis for hyper-parameter selection in the quadratic case as well as some results in general strongly convex setting. In contrast, we show theoretically and verify empirically that the standard SGD+Nesterov can diverge for common choices of hyper-parameter values. We also analyze the practically important question of the dependence of the convergence rate and optimal hyper-parameters as functions of the mini-batch size, demonstrating three distinct regimes: linear scaling, diminishing returns and saturation.

翻译：我们引入了加速的SGD方法,以优化超平衡模型。我们的方法简单而有效,不需要在优化过程中调整超参数或计算全梯度。对包括ResNet和革命网络在内的一些深网络标准结构的MASS实验性评估显示,在优化和普及方面,与Adam和SGD相比,业绩有所改善。我们证明,MASS比SGD更快地趋同,并且为四级情况下的超参数选择提供了分析,以及一般的强烈矩形设置的一些结果。相比之下,我们从理论上和从经验上表明,标准SGD+Nesterov可以不同地选择共同的超参数值。我们还分析了合并率和最佳超参数作为小型尺寸功能的切实重要问题,展示了三种不同的制度:线性缩放、降低回报和饱和度。

相关内容

MASS

关注 0

MASS：IEEE International Conference on Mobile Ad-hoc and Sensor Systems。 Explanation：移动Ad hoc和传感器系统IEEE国际会议。 Publisher：IEEE。 SIT： http://dblp.uni-trier.de/db/conf/mass/index.html

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

46+阅读 · 2020年7月4日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

106+阅读 · 2020年5月15日

专知会员服务

164+阅读 · 2020年5月10日

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

72+阅读 · 2020年4月24日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation