This paper explains why deep learning can generalize well, despite large capacity and possible algorithmic instability, nonrobustness, and sharp minima, effectively addressing an open problem in the literature. Based on our theoretical insight, this paper also proposes a family of new regularization methods. Its simplest member was empirically shown to improve base models and achieve state-of-the-art performance on MNIST and CIFAR-10 benchmarks. Moreover, this paper presents both data-dependent and data-independent generalization guarantees with improved convergence rates. Our results suggest several new open areas of research.
翻译:本文件解释了为什么深层次的学习能够很好地概括,尽管能力巨大和可能算法不稳定、非破坏性以及尖锐的微小现象,有效地解决文献中的一个公开问题。根据我们的理论见解,本文件还提出了一套新的正规化方法,其最简单的成员从经验上表明可以改进基础模型和取得MNIST和CIFAR-10基准方面的最先进的业绩。此外,本文件介绍了数据依赖性和数据独立的普遍化保障,同时提高了趋同率。我们的结果显示,有几个新的开放研究领域。