An important question in deep learning is how higher-order optimization methods affect generalization. In this work, we analyze a stochastic Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling for training overparameterized deep neural networks with smooth activations in a regression setting. Our theoretical contributions are twofold. First, we establish finite-time convergence bounds via a variable-metric analysis in parameter space, with explicit dependencies on the batch size, network width and depth. Second, we derive non-asymptotic generalization bounds for SGN using uniform stability in the overparameterized regime, characterizing the impact of curvature, batch size, and overparameterization on generalization performance. Our theoretical results identify a favorable generalization regime for SGN in which a larger minimum eigenvalue of the Gauss-Newton matrix along the optimization path yields tighter stability bounds.
翻译:深度学习中的一个重要问题是高阶优化方法如何影响泛化性能。本文在回归任务背景下,分析了采用Levenberg-Marquardt阻尼和小批量采样的随机高斯-牛顿(SGN)方法训练具有光滑激活函数的过参数化深度神经网络。我们的理论贡献包含两个方面:首先,通过参数空间的变度量分析建立了有限时间收敛界,明确揭示了批量大小、网络宽度与深度对收敛速度的影响;其次,利用过参数化机制中的一致稳定性理论,推导出SGN方法的非渐近泛化界,系统刻画了曲率、批量大小和过参数化程度对泛化性能的作用机制。理论结果表明,当优化路径上高斯-牛顿矩阵的最小特征值较大时,SGN方法将进入更优的泛化区域,此时稳定性界限更为紧凑。