Stochastic optimization is fundamental to modern machine learning. Recent research has extended the study of stochastic first-order methods (SFOMs) from light-tailed to heavy-tailed noise, which frequently arises in practice, with clipping emerging as a key technique for controlling heavy-tailed gradients. Extensive theoretical advances have further shown that the oracle complexity of SFOMs depends on the tail index $α$ of the noise. Nonetheless, existing complexity results often cover only the case $α\in (1,2]$, that is, the regime where the noise has a finite mean, while the complexity bounds tend to infinity as $α$ approaches $1$. This paper tackles the general case of noise with tail index $α\in(0,2]$, covering regimes ranging from noise with bounded variance to noise with an infinite mean, where the latter case has been scarcely studied. Through a novel analysis of the bias-variance trade-off in gradient clipping, we show that when a symmetry measure of the noise tail is controlled, clipped SFOMs achieve improved complexity guarantees in the presence of heavy-tailed noise for any tail index $α\in (0,2]$. Our analysis of the bias-variance trade-off not only yields new unified complexity guarantees for clipped SFOMs across this full range of tail indices, but is also straightforward to apply and can be combined with classical analyses under light-tailed noise to establish oracle complexity guarantees under heavy-tailed noise. Finally, numerical experiments validate our theoretical findings.
翻译:随机优化是现代机器学习的基础。近期研究已将随机一阶方法(SFOMs)的分析从轻尾噪声扩展至实践中常见的重尾噪声,其中梯度截断技术成为控制重尾梯度的关键手段。大量理论进展进一步表明,SFOMs的预言机复杂度取决于噪声的尾指数$α$。然而,现有复杂度结果通常仅覆盖$α\\in (1,2]$的情形,即噪声具有有限均值的区间,且当$α$趋近于1时复杂度边界趋于无穷。本文处理尾指数$α\\in(0,2]$的广义噪声情形,涵盖从有界方差噪声到无限均值噪声的多种区间,其中后者极少被研究。通过对梯度截断中偏差-方差权衡的创新性分析,我们证明当噪声尾部的对称性度量受控时,对于任意尾指数$α\\in (0,2]$,截断SFOMs在重尾噪声存在时均能获得改进的复杂度保证。我们对偏差-方差权衡的分析不仅为截断SFOMs在这一完整尾指数范围内提供了新的统一复杂度保证,而且分析方法简洁易用,可与轻尾噪声下的经典分析相结合,从而建立重尾噪声下的预言机复杂度保证。最后,数值实验验证了我们的理论发现。