Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error. However, compressive networks are desired in many real world applications and direct training of small networks may be trapped in local optima. In this paper, instead of pruning or distilling an over-parameterized model to compressive ones, we propose a parsimonious learning approach based on differential inclusions of inverse scale spaces, that generates a family of models from simple to complex ones with a better efficiency and interpretability than stochastic gradient descent in exploring the model space. It enjoys a simple discretization, the Split Linearized Bregman Iterations, with provable global convergence that from any initializations, algorithmic iterations converge to a critical point of empirical risks. One may exploit the proposed method to boost the complexity of neural networks progressively. Numerical experiments with MNIST, Cifar-10/100, and ImageNet are conducted to show the method is promising in training large scale models with a favorite interpretability.
翻译:目前,在培训神经网络以优化寻求全球自选和减少预测误差方面的普遍化方面,过分偏差是普遍存在的。然而,许多现实世界应用中都希望压缩网络,而小型网络的直接培训可能困在本地自选中。在本文中,我们建议采用基于反规模空间差异包容的偏差学习方法,从简单到复杂的模型,在探索模型空间时产生比随机梯度下降更有效率和可解释的模型组合。它享有简单的分散化,分裂线性线性Bregman迭代法,从任何初始化开始,算法迭代法就可实现全球趋同到经验风险的临界点。我们可以利用拟议的方法逐步提高神经网络的复杂性。与MNIST、Cifar-10-100和图像网络进行的营养实验表明,在对大型模型进行最受欢迎的解释性培训时,该方法很有希望。