Versatile 用于渐进式模拟器的单线方法:第一和第二顺序最优化及其在联邦学习中的应用 (Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning)

While variance reduction methods have shown great success in solving large scale optimization problems, many of them suffer from accumulated errors and, therefore, should periodically require the full gradient computation. In this paper, we present a single-loop algorithm named SLEDGE (Single-Loop mEthoD for Gradient Estimator) for finite-sum nonconvex optimization, which does not require periodic refresh of the gradient estimator but achieves nearly optimal gradient complexity. Unlike existing methods, SLEDGE has the advantage of versatility; (i) second-order optimality, (ii) exponential convergence in the PL region, and (iii) smaller complexity under less heterogeneity of data. We build an efficient federated learning algorithm by exploiting these favorable properties. We show the first and second-order optimality of the output and also provide analysis under PL conditions. When the local budget is sufficiently large and clients are less (Hessian-)~heterogeneous, the algorithm requires fewer communication rounds then existing methods such as FedAvg, SCAFFOLD, and Mime. The superiority of our method is verified in numerical experiments.

翻译：虽然减少差异的方法在解决大规模优化问题方面表现出了巨大的成功,但其中许多方法都存在累积错误,因此,应该定期要求完全梯度计算。在本文中,我们提出了一个名为 SLEDGE(Single-Loop mEthoD for Gradient Estimator)的单环算法(SLEDGE ), 用于限定和不折不扣的非convex优化, 这不需要定期更新梯度估计器,但达到接近最佳的梯度复杂性。与现有方法不同, SLEDGE 具有多功能的优势;(一) 第二阶最佳性, (二) PLL区域的指数趋同, (三) 在数据不那么繁杂的情况下, (三) 更小的复杂性。我们通过利用这些有利的特性,建立了高效的联邦学习算法。我们展示了产出的第一阶和第二阶的优化性, 并在PLE值条件下提供分析。当当地预算足够大,客户较少(赫西亚) ~遗传性时, 算法需要更少的交流周期,然后要求更少的通信周期,例如FedAvg, SCAFDEFOLD和Mime。我们的方法的优势在数字实验中得到验证。