Dropout的隐式正则化 (Implicit regularization of dropout) - 专知论文

会员服务 ·

0

暂退法 · 隐式正则化 · 正则化 · 神经网络 · 泛化 ·

2023 年 4 月 10 日

Implicit regularization of dropout

翻译：Dropout的隐式正则化

Zhongwang Zhang,Zhi-Qin John Xu

from arxiv, arXiv admin note: text overlap with arXiv:2111.01022

It is important to understand how dropout, a popular regularization method, aids in achieving a good generalization solution during neural network training. In this work, we present a theoretical derivation of an implicit regularization of dropout, which is validated by a series of experiments. Additionally, we numerically study two implications of the implicit regularization, which intuitively rationalizes why dropout helps generalization. Firstly, we find that input weights of hidden neurons tend to condense on isolated orientations trained with dropout. Condensation is a feature in the non-linear learning process, which makes the network less complex. Secondly, we experimentally find that the training with dropout leads to the neural network with a flatter minimum compared with standard gradient descent training, and the implicit regularization is the key to finding flat solutions. Although our theory mainly focuses on dropout used in the last hidden layer, our experiments apply to general dropout in training neural networks. This work points out a distinct characteristic of dropout compared with stochastic gradient descent and serves as an important basis for fully understanding dropout.

翻译：在神经网络训练中，了解dropout这种常用正则化方法如何帮助实现良好的泛化解决方案非常重要。在这项工作中，我们推导了dropout的隐式正则化的理论，并通过一系列实验进行了验证。此外，我们还对隐式正则化的两个含义进行了数值研究，从直观上说明了为什么dropout有助于泛化。首先，我们发现隐藏神经元的输入权重在dropout训练过程中往往会凝聚在孤立的方向上。在非线性学习过程中，凝聚是一个减少网络复杂性的特征。其次，我们通过实验发现，dropout训练会导致神经网络的最小值相对于标准梯度下降训练具有更平的形状，并且隐式正则化是发现平坦解的关键。虽然我们的理论主要集中在在最后的隐藏层中使用的dropout上，但我们的实验适用于训练神经网络中的一般性dropout。这项工作指出了dropout相比于随机梯度下降的明显特征，并为充分理解dropout提供了重要基础。

0

相关内容

暂退法

【深度学习中的隐式正则化】从矩阵和张量分解中得到的教训，141页ppt

【深度学习中的隐式正则化】从矩阵和张量分解中得到的教训，141页ppt

专知会员服务

58+阅读 · 2021年4月5日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

专知会员服务

77+阅读 · 2020年6月28日

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

专知会员服务

34+阅读 · 2020年3月4日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

隐丹参酮通过抑制STAT3酪氨酸磷酸化抗恶性神经胶质瘤增殖及其分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

再生核希尔伯特空间图像稀疏表达算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

Connexin32在调控EMT影响肝细胞癌奥沙利铂化疗敏感性中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

雷帕霉素复合物1在巨噬细胞炎症反应中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

泛素化蛋白酶A20抑制caspase-8活化在脑肿瘤干细胞发生TRAIL抵抗中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

抗氧化蛋白Trx2在脂联素抑制肝细胞癌过程中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

肿瘤细胞中凋亡抑制蛋白CFLAR乙酰化调控的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

石榴鞣花酸调控胆固醇代谢的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Oct4对FSH促进上皮性卵巢癌生长、转移信号转导的调控作用

国家自然科学基金

0+阅读 · 2011年12月31日

无/低cyclins肿瘤细胞群(NCCCs)：一种新的肿瘤细胞亚群？

国家自然科学基金

0+阅读 · 2011年12月31日

Manifold Regularization for Memory-Efficient Training of Deep Neural Networks

Arxiv

0+阅读 · 2023年5月26日

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

Arxiv

0+阅读 · 2023年5月26日

Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression

Arxiv

0+阅读 · 2023年5月25日

Vector-Valued Variation Spaces and Width Bounds for DNNs: Insights on Weight Decay Regularization

Arxiv

0+阅读 · 2023年5月25日

Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration

Arxiv

0+阅读 · 2023年5月25日

Criteria for the construction of MDS convolutional codes with good column distances

Arxiv

0+阅读 · 2023年5月25日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

W-net: Bridged U-net for 2D Medical Image Segmentation

W-net: Bridged U-net for 2D Medical Image Segmentation

Arxiv

20+阅读 · 2018年7月12日

VIP会员

文章信息

相关主题

隐式正则化

相关VIP内容

【深度学习中的隐式正则化】从矩阵和张量分解中得到的教训，141页ppt

【深度学习中的隐式正则化】从矩阵和张量分解中得到的教训，141页ppt

专知会员服务

58+阅读 · 2021年4月5日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

专知会员服务

77+阅读 · 2020年6月28日

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

专知会员服务

34+阅读 · 2020年3月4日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【EMNLP2025最佳论文】INFINI-GRAM MINI：基于 FM-Index 的互联网级精确 n-gram 搜索

【EMNLP2025教程】高效的大语言模型推理：算法、模型与系统，203页ppt

AI医疗行业研究报告：AI医疗前景广阔

【斯坦福博士论文】多模态基础模型：从科学理解到科学发现

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Manifold Regularization for Memory-Efficient Training of Deep Neural Networks

Arxiv

0+阅读 · 2023年5月26日

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

Arxiv

0+阅读 · 2023年5月26日

Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression

Arxiv

0+阅读 · 2023年5月25日

Vector-Valued Variation Spaces and Width Bounds for DNNs: Insights on Weight Decay Regularization

Arxiv

0+阅读 · 2023年5月25日

Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration

Arxiv

0+阅读 · 2023年5月25日

Criteria for the construction of MDS convolutional codes with good column distances

Arxiv

0+阅读 · 2023年5月25日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

W-net: Bridged U-net for 2D Medical Image Segmentation

W-net: Bridged U-net for 2D Medical Image Segmentation

Arxiv

20+阅读 · 2018年7月12日

相关基金

隐丹参酮通过抑制STAT3酪氨酸磷酸化抗恶性神经胶质瘤增殖及其分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

再生核希尔伯特空间图像稀疏表达算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

Connexin32在调控EMT影响肝细胞癌奥沙利铂化疗敏感性中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

雷帕霉素复合物1在巨噬细胞炎症反应中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

泛素化蛋白酶A20抑制caspase-8活化在脑肿瘤干细胞发生TRAIL抵抗中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

抗氧化蛋白Trx2在脂联素抑制肝细胞癌过程中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

肿瘤细胞中凋亡抑制蛋白CFLAR乙酰化调控的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

石榴鞣花酸调控胆固醇代谢的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Oct4对FSH促进上皮性卵巢癌生长、转移信号转导的调控作用

国家自然科学基金

0+阅读 · 2011年12月31日

无/低cyclins肿瘤细胞群(NCCCs)：一种新的肿瘤细胞亚群？

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员