Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the loss function generally only depends on the realization of a neural network, i.e. the function it computes. Studying the functional optimization problem over the space of realizations can open up completely new ways to understand neural network training. In particular, usual loss functions like the mean squared error are convex on sets of neural network realizations, which themselves are non-convex. Note, however, that each realization has many different, possibly degenerate, parametrizations. In particular, a local minimum in the parametrization space needs not correspond to a local minimum in the realization space. To establish such a connection, inverse stability of the realization map is required, meaning that proximity of realizations must imply proximity of corresponding parametrizations. In this paper we present pathologies which prevent inverse stability in general, and proceed to establish a restricted set of parametrizations on which we have inverse stability w.r.t. to a Sobolev norm. Furthermore, we show that by optimizing over such restricted sets, it is still possible to learn any function, which can be learned by optimization over unrestricted sets. While most of this paper focuses on shallow networks, none of methods used are, in principle, limited to shallow networks, and it should be possible to extend them to deep neural networks.
翻译:神经网络培训通常通过使用随机梯度梯度下降解决非confex优化问题来实现。 虽然在网络参数上最优化, 但损失功能一般只取决于神经网络的实现, 也就是它所计算的函数。 研究实现空间的功能优化问题可以打开全新的方式来理解神经网络培训。 特别是, 通常的损失功能, 如平均平方错误, 与神经网络实现的组合相融合, 而神经网络实现的组合本身本身不是 convex 。 但是, 注意, 每一种实现的都有许多不同之处, 可能是深度退化的, 超称的。 特别是, 超称空间的本地最小值与实现空间的本地最小值并不匹配。 要建立这样的连接, 相对于实现空间的稳定性来说, 需要的是, 实现空间的接近意味着相近度意味着相近于对应的偏近点。 在本文中, 我们所展示的病理是整体上阻碍反稳定性的, 并开始建立一套限制性的超定点, 也就是我们相对稳定的网络, 最底层的网络, 而我们所学会的平面的轨道, 最接近于任何不受限制, 。