We provide sharp path-dependent generalization and excess risk guarantees for the full-batch Gradient Descent (GD) algorithm on smooth losses (possibly non-Lipschitz, possibly nonconvex), under an interpolation regime. At the heart of our analysis is a new generalization error bound for deterministic symmetric algorithms, which implies that average output stability and a bounded expected optimization error at termination lead to generalization. This result shows that small generalization error occurs along the optimization path, and allows us to bypass Lipschitz or sub-Gaussian assumptions on the loss prevalent in previous works. For nonconvex, Polyak-Lojasiewicz (PL), convex and strongly convex losses, we show the explicit dependence of the generalization error in terms of the accumulated path-dependent optimization error, terminal optimization error, number of samples, and number of iterations. For nonconvex smooth losses, we prove that full-batch GD efficiently generalizes close to any stationary point at termination, under the proper choice of a decreasing step size. Further, if the loss is nonconvex but the objective is PL, we derive quadratically vanishing bounds on the generalization error and the corresponding excess risk, for a choice of a large constant step size. For (resp. strongly-) convex smooth losses, we prove that full-batch GD also generalizes for large constant step sizes, and achieves (resp. quadratically) small excess risk while training fast. In all cases, we close the generalization error gap, by showing matching generalization and optimization error rates. Our full-batch GD generalization error and excess risk bounds are strictly tighter than existing bounds for (stochastic) GD, when the loss is smooth (but possibly non-Lipschitz).
翻译:我们的分析核心是确定性对称算法的一个新的概括性错误,这意味着平均产出稳定性和在终止时受约束的预期优化差错会导致总体化。这一结果显示,在优化路径上发生了小的概括性错误,并使我们能够绕过Lipschitz或Gaussian对以往工程中普遍存在的损失所作的假设。对于非convex、Polyak-Lojasiewicz(PL)、Convex和强烈的 convex损失,我们的分析核心是一个新的概括性错误,它意味着平均产出稳定性和在终止时受约束的预期优化差错导致总体化。对于非convex平稳损失,我们证明全局性GD(全局性GD)在结束时接近任何固定点,在正确选择的一步大小下,Polyak-lojasiewicz(Plicky),对于不连续性错误(如果我们总优化错误、终端优化错误、一般缩尾部、总缩尾部、总缩缩),则显示平稳损失的风险。