Past works have shown that, somewhat surprisingly, over-parametrization can help generalization in neural networks. Towards explaining this phenomenon, we adopt a margin-based perspective. We establish: 1) for multi-layer feedforward relu networks, the global minimizer of a weakly-regularized cross-entropy loss has the maximum normalized margin among all networks, 2) as a result, increasing the over-parametrization improves the normalized margin and generalization error bounds for two-layer networks. In particular, an infinite-size neural network enjoys the best generalization guarantees. The typical infinite feature methods are kernel methods; we compare the neural net margin with that of kernel methods and construct natural instances where kernel methods have much weaker generalization guarantees. We validate this gap between the two approaches empirically. Finally, this infinite-neuron viewpoint is also fruitful for analyzing optimization. We show that a perturbed gradient flow on infinite-size networks finds a global optimizer in polynomial time.
翻译:过去的工作表明,有些令人惊讶的是,超平衡化有助于神经网络的概括化。在解释这一现象时,我们采用了基于边距的视角。我们建立了:1)多层进料回流网络,全球将所有网络中所有常规化程度低的跨热带损失的最小化幅度最大,2)结果,超平衡化的扩大改善了两层网络的正常差值和一般化差值。特别是,无限规模神经网络享有最佳的一般化保障。典型的无限特征方法是内核方法;我们比较神经网边距与内核方法的边缘,并构建内核方法较弱的自然实例。我们从经验上验证两种方法之间的这一差距。最后,无限中子观点对于分析优化也颇有成效。我们显示,无限规模网络上一个环绕的梯度流在多球时会发现一个全球优化器。