Nesterov动量是动量的变种。与Momentum唯一区别就是，计算梯度的不同，Nesterov动量先用当前的速度v更新一遍参数，在用更新的临时参数计算梯度。

### 最新论文

While many distributed optimization algorithms have been proposed for solving smooth or convex problems over the networks, few of them can handle non-convex and non-smooth problems. Based on a proximal primal-dual approach, this paper presents a new (stochastic) distributed algorithm with Nesterov momentum for accelerated optimization of non-convex and non-smooth problems. Theoretically, we show that the proposed algorithm can achieve an $\epsilon$-stationary solution under a constant step size with $\mathcal{O}(1/\epsilon^2)$ computation complexity and $\mathcal{O}(1/\epsilon)$ communication complexity. When compared to the existing gradient tracking based methods, the proposed algorithm has the same order of computation complexity but lower order of communication complexity. To the best of our knowledge, the presented result is the first stochastic algorithm with the $\mathcal{O}(1/\epsilon)$ communication complexity for non-convex and non-smooth problems. Numerical experiments for a distributed non-convex regression problem and a deep neural network based classification problem are presented to illustrate the effectiveness of the proposed algorithms.

Top