In this work, we establish non-asymptotic convergence bounds for the Gauss-Newton method in training neural networks with smooth activations. In the underparameterized regime, the Gauss-Newton gradient flow in parameter space induces a Riemannian gradient flow on a low-dimensional embedded submanifold of the function space. Using tools from Riemannian optimization, we establish geodesic Polyak-Lojasiewicz and Lipschitz-smoothness conditions for the loss under appropriately chosen output scaling, yielding geometric convergence to the optimal in-class predictor at an explicit rate independent of the conditioning of the Gram matrix. In the overparameterized regime, we propose adaptive, curvature-aware regularization schedules that ensure fast geometric convergence to a global optimum at a rate independent of the minimum eigenvalue of the neural tangent kernel and, locally, of the modulus of strong convexity of the loss. These results demonstrate that Gauss-Newton achieves accelerated convergence rates in settings where first-order methods exhibit slow convergence due to ill-conditioned kernel matrices and loss landscapes.
翻译:本文针对激活函数光滑的神经网络训练,建立了高斯-牛顿方法的非渐近收敛界。在欠参数化机制下,参数空间中的高斯-牛顿梯度流在函数空间的低维嵌入子流形上诱导出黎曼梯度流。借助黎曼优化理论工具,我们在适当选取输出缩放比例的条件下,证明了损失函数满足测地线Polyak-Lojasiewicz条件与Lipschitz光滑性,从而获得与格拉姆矩阵条件数无关的显式几何收敛速率,确保收敛至类内最优预测器。在过参数化机制下,我们提出具有自适应曲率感知的正则化调度方案,该方案能保证以独立于神经正切核最小特征值的速率快速几何收敛至全局最优解,且在局部区域内收敛速率与损失函数的强凸模量无关。这些结果表明,在一阶方法因病态核矩阵和损失函数地形导致收敛缓慢的场景中,高斯-牛顿方法能够实现加速收敛。