Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).
翻译:最近,分散式随机优化方法引起了许多关注,这主要是因为其每迭代成本、数据地点和通信效率低廉。在本文件中,我们引入了统一的趋同分析,涵盖大量分散式 SGD 方法,这些方法到目前为止需要不同的直觉,有不同的应用,而且在不同社区单独开发。我们的算法框架覆盖了本地 SGD 更新以及同步和对称的网络地形流言传更新。我们得出了顺畅(混凝土和非混凝土)问题的普遍趋同率,以及混杂(非身份分布数据)和iid-d数据设置的内插率,恢复了许多特殊案例中的线性趋同率,例如超均匀模型。我们的证据依赖于薄弱的假设(通常比以往工作在若干方面有所改进),并恢复(并改进)一系列重要情景最已知的复杂结果,例如混集式 SGD 和 federate 平均(本地 SGD ) 。