关于联邦学习中当地血统方法的融合问题 (On the Convergence of Local Descent Methods in Federated Learning)

In federated distributed learning, the goal is to optimize a global training objective defined over distributed devices, where the data shard at each device is sampled from a possibly different distribution (a.k.a., heterogeneous or non i.i.d. data samples). In this paper, we generalize the local stochastic and full gradient descent with periodic averaging-- originally designed for homogeneous distributed optimization, to solve nonconvex optimization problems in federated learning. Although scant research is available on the effectiveness of local SGD in reducing the number of communication rounds in homogeneous setting, its convergence and communication complexity in heterogeneous setting is mostly demonstrated empirically and lacks through theoretical understating. To bridge this gap, we demonstrate that by properly analyzing the effect of unbiased gradients and sampling schema in federated setting, under mild assumptions, the implicit variance reduction feature of local distributed methods generalize to heterogeneous data shards and exhibits the best known convergence rates of homogeneous setting both in general nonconvex and under {\pl}~ condition (generalization of strong-convexity). Our theoretical results complement the recent empirical studies that demonstrate the applicability of local GD/SGD to federated learning. We also specialize the proposed local method for networked distributed optimization. To the best of our knowledge, the obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.

翻译：在联合分布式教学中,目标是优化在分布式设备上界定的全球培训目标,在分布式设备上,每个设备的数据碎片是从可能不同的分布(a.k.a.a.,混杂或非i.d.数据样本)中取样的。在本文件中,我们以定期平均平均分布式优化为基础,对当地偏差和完全梯度下坡法进行概括化,定期平均原设计为单一分布式分配优化,以解决联合会式学习中非混凝土优化问题。虽然对地方 SGD在减少同质设置中的通信回合数量方面效果的研究很少,但其在差异性环境下的趋同和通信复杂性大多通过理论低调来证明,缺乏。为了缩小这一差距,我们通过适当分析无偏差梯度梯度和完全梯度下坡度下坡度的取样结果,在轻度假设下,将当地分布式方法的隐含的缩小差异性特征,概括为混凝度数据碎片,并展示已知的在一般非凝固化型和处于普罗化状态(普遍化) 。我们的理论结果补充了最近的实验性研究,展示了当地平均网络的中央化统一化方法的可应用性统一化,我们所了解的中央-SG-D-联邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-邦-