Spectral normalization (SN) is a widely-used technique for improving the stability and sample quality of Generative Adversarial Networks (GANs). However, there is currently limited understanding of why SN is effective. In this work, we show that SN controls two important failure modes of GAN training: exploding and vanishing gradients. Our proofs illustrate a (perhaps unintentional) connection with the successful LeCun initialization. This connection helps to explain why the most popular implementation of SN for GANs requires no hyper-parameter tuning, whereas stricter implementations of SN have poor empirical performance out-of-the-box. Unlike LeCun initialization which only controls gradient vanishing at the beginning of training, SN preserves this property throughout training. Building on this theoretical understanding, we propose a new spectral normalization technique: Bidirectional Scaled Spectral Normalization (BSSN), which incorporates insights from later improvements to LeCun initialization: Xavier initialization and Kaiming initialization. Theoretically, we show that BSSN gives better gradient control than SN. Empirically, we demonstrate that it outperforms SN in sample quality and training stability on several benchmark datasets.
翻译:光谱正常化(SN)是用来改善基因反转网络(GANs)的稳定性和样本质量的一种广泛应用技术。然而,目前对SN之所以有效的原因了解有限。在这项工作中,我们显示SN控制了GAN训练的两种重要的失败模式:爆炸和消失梯度。我们的证明表明与成功的 LeCun 初始化有(可能是无意的)联系。这个联系有助于解释为什么最受欢迎的对GANs的SN实施不需要超参数调整,而更严格地执行SN的实验性能则出局。与只控制在培训开始时消失的梯度的LECun初始化不同,SNN在整个培训中保存了这一属性。我们根据这种理论理解提出新的光谱正常化技术:双向缩放光谱正常化(BSSN),它包含了后来对LEC初始化的改进:Xavier 初始化和Kaiming初始化的洞察力。理论上,我们表明,BSNSN比SN的梯度控制要好一些样本质量标准。