Batch Normalization (BN) has proven to be an effective algorithm for deep neural network training by normalizing the input to each neuron and reducing the internal covariate shift. The space of weight vectors in the BN layer can be naturally interpreted as a Riemannian manifold, which is invariant to linear scaling of weights. Following the intrinsic geometry of this manifold provides a new learning rule that is more efficient and easier to analyze. We also propose intuitive and effective gradient clipping and regularization methods for the proposed algorithm by utilizing the geometry of the manifold. The resulting algorithm consistently outperforms the original BN on various types of network architectures and datasets.
翻译:批量正常化(BN)已证明是深神经网络培训的有效算法,它使输入每个神经元正常化并减少内部共变变化。 BN 层中重量矢量的空间可以自然地被解释为一个里曼式的元体,这种元体不易线性加权。这个元体的内在几何制提供了一个新的学习规则,更高效、更便于分析。我们还提议了利用各种元件的几何法对拟议的算法进行直观、有效的梯度剪切和正规化方法。由此产生的算法在各种网络结构和数据集方面始终优于原始的BN。