In this paper, we propose a generalization of the Batch Normalization (BN) algorithm, diminishing batch normalization (DBN), where we update the BN parameters in a diminishing moving average way. BN is very effective in accelerating the convergence of a neural network training phase that it has become a common practice. Our proposed DBN algorithm remains the overall structure of the original BN algorithm while introduces a weighted averaging update to some trainable parameters. We provide an analysis of the convergence of the DBN algorithm that converges to a stationary point with respect to trainable parameters. Our analysis can be easily generalized for original BN algorithm by setting some parameters to constant. To the best knowledge of authors, this analysis is the first of its kind for convergence with Batch Normalization introduced. We analyze a two-layer model with arbitrary activation function. The primary challenge of the analysis is the fact that some parameters are updated by gradient while others are not. The convergence analysis applies to any activation function that satisfies our common assumptions. In the numerical experiments, we test the proposed algorithm on complex modern CNN models with stochastic gradients and ReLU activation. We observe that DBN outperforms the original BN algorithm on MNIST, NI and CIFAR-10 datasets with reasonable complex FNN and CNN models.
翻译:在本文中,我们建议对批量正常化(BN)算法进行概括化,减少批量正常化(DBN),这样我们就可以以不断递减的平均方式更新BN参数。 BN在加速神经网络培训阶段的趋同方面非常有效,这已成为一种常见的做法。我们提议的DBN算法仍然是原始BN算法的总体结构,同时对一些可训练参数进行加权平均更新。我们分析了DBN算法的趋同性,这种算法在可训练参数方面与固定点相趋同。我们的分析可以很容易地对原BN算法进行普及,方法是设置一些不变的参数。对于作者来说,这种分析是这种分析是加速与Batch正常化所引入的神经网络培训阶段趋同的第一次。我们分析了一种具有任意激活功能的双层模型。我们分析的主要挑战是,有些参数是由梯度来更新,而另一些参数则不是。我们对于任何能满足我们共同假设的激活功能,我们进行了趋同的分析。在数字实验中,我们测试了复杂的现代CNNMR模型和ReLU激活的复杂现代模型。我们观察到的是,DBNMISMIS和R的原始模型是合理的。我们用FMIS-NIS-NIS-10的复杂。