This paper considers the smooth bilevel optimization in which the lower-level problem is strongly convex and the upper-level problem is possibly nonconvex. We focus on the stochastic setting where the algorithm can access the unbiased stochastic gradient evaluation with heavy-tailed noise, which is prevalent in many machine learning applications, such as training large language models and reinforcement learning. We propose a nested-loop normalized stochastic bilevel approximation (N$^2$SBA) for finding an $ε$-stationary point with the stochastic first-order oracle (SFO) complexity of $\tilde{\mathcal{O}}\big(κ^{\frac{7p-3}{p-1}} σ^{\frac{p}{p-1}} ε^{-\frac{4 p - 2}{p-1}}\big)$, where $κ$ is the condition number, $p\in(1,2]$ is the order of central moment for the noise, and $σ$ is the noise level. Furthermore, we specialize our idea to solve the nonconvex-strongly-concave minimax optimization problem, achieving an $ε$-stationary point with the SFO complexity of~$\tilde{\mathcal O}\big(κ^{\frac{2p-1}{p-1}} σ^{\frac{p}{p-1}} ε^{-\frac{3p-2}{p-1}}\big)$. All the above upper bounds match the best-known results under the special case of the bounded variance setting, i.e., $p=2$. We also conduct the numerical experiments to show the empirical superiority of the proposed methods.
翻译:本文研究光滑双层优化问题,其中下层问题为强凸优化,而上层问题可能为非凸优化。我们关注随机设置,其中算法可访问带有重尾噪声的无偏随机梯度估计,这在许多机器学习应用中普遍存在,例如训练大语言模型和强化学习。我们提出了一种嵌套循环归一化随机双层逼近方法(N$^2$SBA),用于寻找$ε$-稳定点,其随机一阶预言机(SFO)复杂度为$\tilde{\mathcal{O}}\big(κ^{\frac{7p-3}{p-1}} σ^{\frac{p}{p-1}} ε^{-\frac{4 p - 2}{p-1}}\big)$,其中$κ$为条件数,$p\in(1,2]$为噪声的中心矩阶数,$σ$为噪声水平。此外,我们将该思想专门应用于求解非凸-强凹极小极大优化问题,获得$ε$-稳定点的SFO复杂度为~$\tilde{\mathcal O}\big(κ^{\frac{2p-1}{p-1}} σ^{\frac{p}{p-1}} ε^{-\frac{3p-2}{p-1}}\big)$。以上所有上界在有限方差设置(即$p=2$)的特殊情况下均与已知最佳结果一致。我们还进行了数值实验,以展示所提出方法的实证优越性。