优化和修剪贝叶斯深层学习 (On the optimization and pruning for Bayesian deep learning)

The goal of Bayesian deep learning is to provide uncertainty quantification via the posterior distribution. However, exact inference over the weight space is computationally intractable due to the ultra-high dimensions of the neural network. Variational inference (VI) is a promising approach, but naive application on weight space does not scale well and often underperform on predictive accuracy. In this paper, we propose a new adaptive variational Bayesian algorithm to train neural networks on weight space that achieves high predictive accuracy. By showing that there is an equivalence to Stochastic Gradient Hamiltonian Monte Carlo(SGHMC) with preconditioning matrix, we then propose an MCMC within EM algorithm, which incorporates the spike-and-slab prior to capture the sparsity of the neural network. The EM-MCMC algorithm allows us to perform optimization and model pruning within one-shot. We evaluate our methods on CIFAR-10, CIFAR-100 and ImageNet datasets, and demonstrate that our dense model can reach the state-of-the-art performance and our sparse model perform very well compared to previously proposed pruning schemes.

翻译：Bayesian深层学习的目的是通过后天分布提供不确定性的量化。但是,由于神经网络超高尺寸,对重量空间的精确推算在计算上是难以做到的。变动推论(VI)是一个很有希望的方法,但对重量空间的天真的应用并不大,而且往往低于预测的准确性。在本文中,我们提出了一种新的适应性变异的Bayesian算法,用于对重量空间的神经网络进行培训,从而实现较高的预测准确性。我们通过显示具有先决条件矩阵的Stochatic Grabitic Hamilton Monton Monte Carlo(SGHMC)具有等同性,我们然后提议在EM算法中设置一个MCMC,其中纳入了在捕捉神经网络的偏狭之前的螺旋和斜体。EM-MC算法允许我们在一发内进行优化和模型运行。我们评估了我们在CFAR-10、CIFAR-100和图像网络数据集上的方法,并表明我们的密度模型能够达到最先进的状态性,而我们稀有的模型与先前提议的模拟计划相比运作得非常顺利。