We introduce \textit{basic inequalities} for first-order iterative optimization algorithms, forming a simple and versatile framework that connects implicit and explicit regularization. While related inequalities appear in the literature, we isolate and highlight a specific form and develop it as a well-rounded tool for statistical analysis. Let $f$ denote the objective function to be optimized. Given a first-order iterative algorithm initialized at $θ_0$ with current iterate $θ_T$, the basic inequality upper bounds $f(θ_T)-f(z)$ for any reference point $z$ in terms of the accumulated step sizes and the distances between $θ_0$, $θ_T$, and $z$. The bound translates the number of iterations into an effective regularization coefficient in the loss function. We demonstrate this framework through analyses of training dynamics and prediction risk bounds. In addition to revisiting and refining known results on gradient descent, we provide new results for mirror descent with Bregman divergence projection, for generalized linear models trained by gradient descent and exponentiated gradient descent, and for randomized predictors. We illustrate and supplement these theoretical findings with experiments on generalized linear models.
翻译:我们为一阶迭代优化算法引入了\textit{基本不等式},构建了一个连接隐式与显式正则化的简洁而通用的框架。尽管相关不等式在文献中已有出现,但我们分离并强调了一种特定形式,并将其发展为一个适用于统计分析的全面工具。令$f$表示待优化的目标函数。给定一个在$θ_0$处初始化、当前迭代点为$θ_T$的一阶迭代算法,该基本不等式通过累积步长以及$θ_0$、$θ_T$与参考点$z$之间的距离,对任意参考点$z$给出了$f(θ_T)-f(z)$的上界。该上界将迭代次数转化为损失函数中的有效正则化系数。我们通过分析训练动态和预测风险界来展示这一框架。除了重新审视并细化关于梯度下降的已知结果外,我们还为采用Bregman散度投影的镜像下降、通过梯度下降和指数梯度下降训练的广义线性模型,以及随机化预测器提供了新的结果。我们通过广义线性模型上的实验对这些理论发现进行了说明和补充。