作为结构化缩小前的辍学 (Dropout as a Structured Shrinkage Prior) - 专知论文

会员服务 ·

0

暂退法 · 噪声 · Networking · Neural Networks · 重参数化 ·

2019 年 2 月 10 日

Dropout as a Structured Shrinkage Prior

翻译：作为结构化缩小前的辍学

Eric Nalisnick,José Miguel Hernández-Lobato,Padhraic Smyth

Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network's weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout's Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior "automatic depth determination" as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks.

翻译：深层神经网络的脱落是防止过度配置的一个神秘而有效的工具。其成功的解释包括防止“ 相适应” 重量, 以及一种廉价的贝叶西亚推论。我们提出了一个新框架来理解神经网络中的多复制性噪音, 同时考虑到连续的分布以及Bernoulli 噪音( 即辍学 ) 。我们显示, 多复制性噪音会诱使网络重量结构化的缩缩缩前端。我们通过比例混合物的重新平衡特性和不引用近似值来得出等值。考虑到等值, 我们然后显示, 辍学者蒙特卡洛培训的目标近似边缘的MAP估计值。我们利用这些洞见来提出一个新的Resnet 缩略框架, 将先前的“ 自动深度确定” 称为网络深度的自然比喻。最后, 我们调查两个推论战略, 改进了上述回归基准的MAP近似值。

0

相关内容

暂退法

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

42+阅读 · 2020年6月29日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

专知会员服务

18+阅读 · 2020年4月9日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

专知会员服务

32+阅读 · 2020年3月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

春节充电系列：李宏毅2017机器学习课程学习笔记24之结构化学习-Structured SVM（part 2）

春节充电系列：李宏毅2017机器学习课程学习笔记24之结构化学习-Structured SVM（part 2）

专知

4+阅读 · 2018年3月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

ASLFeat: Learning Local Features of Accurate Shape and Localization

Arxiv

6+阅读 · 2020年3月23日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

14+阅读 · 2020年2月25日

Recurrent Event Network: Global Structure Inference over Temporal Knowledge Graph

Recurrent Event Network: Global Structure Inference over Temporal Knowledge Graph

Arxiv

7+阅读 · 2019年10月8日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Deep Structured Prediction with Nonlinear Output Transformations

Arxiv

4+阅读 · 2018年11月1日

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Arxiv

5+阅读 · 2018年7月19日

Automatic multi-objective based feature selection for classification

Automatic multi-objective based feature selection for classification

Arxiv

6+阅读 · 2018年7月9日

Adversarial Structure Matching Loss for Image Segmentation

Arxiv

7+阅读 · 2018年5月18日

MR image reconstruction using deep density priors

Arxiv

5+阅读 · 2018年1月17日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Arxiv

11+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

42+阅读 · 2020年6月29日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

专知会员服务

18+阅读 · 2020年4月9日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

专知会员服务

32+阅读 · 2020年3月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

热门VIP内容

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

春节充电系列：李宏毅2017机器学习课程学习笔记24之结构化学习-Structured SVM（part 2）

春节充电系列：李宏毅2017机器学习课程学习笔记24之结构化学习-Structured SVM（part 2）

专知

4+阅读 · 2018年3月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

ASLFeat: Learning Local Features of Accurate Shape and Localization

Arxiv

6+阅读 · 2020年3月23日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

14+阅读 · 2020年2月25日

Recurrent Event Network: Global Structure Inference over Temporal Knowledge Graph

Recurrent Event Network: Global Structure Inference over Temporal Knowledge Graph

Arxiv

7+阅读 · 2019年10月8日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Deep Structured Prediction with Nonlinear Output Transformations

Arxiv

4+阅读 · 2018年11月1日

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

Arxiv

5+阅读 · 2018年7月19日

Automatic multi-objective based feature selection for classification

Automatic multi-objective based feature selection for classification

Arxiv

6+阅读 · 2018年7月9日

Adversarial Structure Matching Loss for Image Segmentation

Arxiv

7+阅读 · 2018年5月18日

MR image reconstruction using deep density priors

Arxiv

5+阅读 · 2018年1月17日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Arxiv

11+阅读 · 2018年1月11日

微信扫码咨询专知VIP会员