In this paper, we study decentralized online stochastic non-convex optimization over a network of nodes. Integrating a technique called gradient tracking in decentralized stochastic gradient descent, we show that the resulting algorithm, GT-DSGD, enjoys certain desirable characteristics towards minimizing a sum of smooth non-convex functions. In particular, for general smooth non-convex functions, we establish non-asymptotic characterizations of GT-DSGD and derive the conditions under which it achieves network-independent performances that match the centralized minibatch SGD. In contrast, the existing results suggest that GT-DSGD is always network-dependent and is therefore strictly worse than the centralized minibatch SGD. When the global non-convex function additionally satisfies the Polyak-Lojasiewics (PL) condition, we establish the linear convergence of GT-DSGD up to a steady-state error with appropriate constant step-sizes. Moreover, under stochastic approximation step-sizes, we establish, for the first time, the optimal global sublinear convergence rate on almost every sample path, in addition to the asymptotically optimal sublinear rate in expectation. Since strongly convex functions are a special case of the functions satisfying the PL condition, our results are not only immediately applicable but also improve the currently known best convergence rates and their dependence on problem parameters.
翻译:在本文中,我们研究了一个节点网络上分散的在线随机非convex优化。 整合了一种称为在分散的随机梯度梯度下下降过程中跟踪梯度的技术。 我们发现,由此产生的算法GT-DSGD在最大限度地减少平滑的非convex功能总和方面享有某些可取的特征。 特别是,对于一般平滑的非convex功能,我们建立了GT-DSGD的非同步特性,并得出了它实现与集中的微型相匹配的网络独立性能的条件。 相比之下,现有结果表明,GT-DSGD总是依赖网络,因此也完全不如集中的Motbatch SGD。 当全球非convex功能进一步满足Polak- Lojasiewics (PL) 条件时,我们建立了GT- DSGDGD向稳定状态错误的线性趋同, 并且以适当的步调大小为基础计算。 此外,我们第一次确定了全球最佳的次线性趋同性趋同率的最佳全球最佳的次直线性趋同率。