Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network's weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout's Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior "automatic depth determination" as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks.
翻译:深层神经网络的脱落是防止过度配置的一个神秘而有效的工具。 其成功的解释包括防止“ 相适应” 重量, 以及一种廉价的贝叶西亚推论。 我们提出了一个新框架来理解神经网络中的多复制性噪音, 同时考虑到连续的分布以及Bernoulli 噪音( 即辍学 ) 。 我们显示, 多复制性噪音会诱使网络重量结构化的缩缩缩前端。 我们通过比例混合物的重新平衡特性和不引用近似值来得出等值。 考虑到等值, 我们然后显示, 辍学者蒙特卡洛培训的目标近似边缘的MAP估计值。 我们利用这些洞见来提出一个新的Resnet 缩略框架, 将先前的“ 自动深度确定” 称为网络深度的自然比喻。 最后, 我们调查两个推论战略, 改进了上述回归基准的MAP近似值。