深深学习一般化,因为参数功能地图偏向于简单的函数 (Deep learning generalizes because the parameter-function map is biased towards simple functions)

Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST. As the target functions in many real problems are expected to be highly structured, this intrinsic simplicity bias helps explain why deep networks generalize well on real world problems. This picture also facilitates a novel PAC-Bayes approach where the prior is taken over the DNN input-output function space, rather than the more conventional prior over parameter space. If we assume that the training algorithm samples parameters close to uniformly within the zero-error region then the PAC-Bayes theorem can be used to guarantee good expected generalization for target functions producing high-likelihood training sets. By exploiting recently discovered connections between DNNs and Gaussian processes to estimate the marginal likelihood, we produce relatively tight generalization PAC-Bayes error bounds which correlate well with the true error on realistic datasets such as MNIST and CIFAR10 and for architectures including convolutional and fully connected networks.

翻译：深心神经网络(DNNs)一般化, 即便在极强的过度平衡制度下, 许多DNNs的边际功能估算图也应该以快速偏向于简单功能。我们随后提供了明确的证据,证明在布林功能的模型 DNN 中,以及在适用于CIFAR10 和 MNIST的更大程度的全面连通和连动网络中,存在这种强烈的简单性偏差。由于许多实际问题的目标功能预计结构性很高,因此我们提供了一个新的解释。通过应用最近从算法信息理论(AIT)中得出的非常普遍的概率复杂性,我们提出许多DNNTs的边际功能估算图应该迅速偏向于简单功能。我们随后提供的是真实的直线性偏差,而不是在布利恩功能的模型DNNNN, 以及完全完全的完全连通性和连动性网络。如果我们假设许多实际问题的目标功能会高度结构化,这种内在的简单性偏差有助于解释深网络在现实性世界问题上普及。这张图片还有助于一种全新的PAC-Bayes 方法, 将先前的连接连接连接到DNNNNP- pl-plut 的参数,而不是更传统的常规的常规的常规的固定的固定的固定的路径, 将产生比常规的常规的比重的常规的参数空间。