Standard practice in training neural networks involves initializing the weights in an independent fashion. The results of recent work suggest that feature "diversity" at initialization plays an important role in training the network. However, other initialization schemes with reduced feature diversity have also been shown to be viable. In this work, we conduct a series of experiments aimed at elucidating the importance of feature diversity at initialization. We show that a complete lack of diversity is harmful to training, but its effects can be counteracted by a relatively small addition of noise - even the noise in standard non-deterministic GPU computations is sufficient. Furthermore, we construct a deep convolutional network with identical features at initialization and almost all of the weights initialized at 0 that can be trained to reach accuracy matching its standard-initialized counterpart.
翻译:培训神经网络的标准做法涉及独立地初始化权重,最近的工作结果表明,初始化时的“多样性”特征在培训网络中起着重要作用,然而,其他特征多样性减少的初始化计划也证明是可行的。在这项工作中,我们进行了一系列实验,旨在阐明初始化时特征多样性的重要性。我们表明,完全缺乏多样性对培训有害,但其影响可以通过相对较少的噪音增加来抵消 — — 甚至标准非确定性GPU计算中的噪音就足够了。此外,我们建立了一个深度的动态网络,在初始化时具有相同的特征,而且几乎所有在零时初始化的重量都具有相同的特征,可以对其进行培训以达到与其标准初始对应的精确度。