AdaSAM:利用适应性学习率和培训深神经网络的动力,提高锐锐化-最小化知识,提高深神经网络培训动力</s> (AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks)

Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step. In this paper, we try to analyze the convergence rate of AdaSAM in the stochastic non-convex setting. We theoretically show that AdaSAM admits a $\mathcal{O}(1/\sqrt{bT})$ convergence rate, which achieves linear speedup property with respect to mini-batch size $b$. Specifically, to decouple the stochastic gradient steps with the adaptive learning rate and perturbed gradient, we introduce the delayed second-order momentum term to decompose them to make them independent while taking an expectation during the analysis. Then we bound them by showing the adaptive learning rate has a limited range, which makes our analysis feasible. To the best of our knowledge, we are the first to provide the non-trivial convergence rate of SAM with an adaptive learning rate and momentum acceleration. At last, we conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMSGrad, and SAM optimizers.

翻译：深入了解最小化( SAM) 优化已被广泛探讨, 因为它可以通过引入额外的扰动步骤来将深神经网络的趋同率纳入深学习模式的景观。将SAM与适应性学习率和加速势头相结合, AdaSAM 被称为AdaSAM, 已经在经验上探索,以培训大型深神经网络,而没有理论保证,因为分析同时的扰动步骤、适应性学习率和动力步骤有三重困难。在本文中,我们试图分析AdaSAM在随机的非骨架设置中的趋同率。我们理论上表明,AdaSAM 接受一个$\ mathcal{O}(1/\ sqrt{bT}) 和动力加速度相结合。将SAM 结合到一个直线性加速度, 具体地说, 要分解调和适应性学习率, 我们引入了第二阶梯期动力术语, 让他们在分析期间独立。然后将SAM 与一些适应性动作率相比较, 我们通过显示一个不精确的适应性学习速度, 我们的SAM 学习速度, 展示了我们的一些适应性适应性学习速度, 学习速度, 展示了我们的一些适应性加速率。</s>

相关内容

自适应学习

关注 10

自适应学习，也被称为自适应教学，是使用计算机算法来协调与学习者的互动，并提供定制学习资源和学习活动来解决每个学习者的独特需求的教育方法。在专业的学习情境，个人可以“试验出”一些训练方式，以确保教学内容的更新。根据学生的学习需要，计算机生成适应其特点的教育材料，包括他们对问题的回答和完成的任务和经验。该技术涵盖了各个研究领域和它们的衍生，包括计算机科学、人工智能、心理测验、教育学、心理学和脑科学。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日