Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some computationally-bounded nodes may not be able to implement first-order, gradient-based optimization, while they could still contribute to joint optimization tasks. In this paper, we initiate the study of hybrid decentralized optimization, studying settings where nodes with zeroth-order and first-order optimization capabilities co-exist in a distributed system, and attempt to jointly solve an optimization task over some data distribution. We essentially show that, under reasonable parameter settings, such a system can not only withstand noisier zeroth-order agents but can even benefit from integrating such agents into the optimization process, rather than ignoring their information. At the core of our approach is a new analysis of distributed optimization with noisy and possibly-biased gradient estimators, which may be of independent interest. Our results hold for both convex and non-convex objectives. Experimental results on standard optimization tasks confirm our analysis, showing that hybrid first-zeroth order optimization can be practical, even when training deep neural networks.
翻译:分布式优化是加速机器学习训练的标准方法,该领域的研究大多集中于分布式一阶梯度优化方法。然而,在某些场景下,部分计算能力受限的节点可能无法执行一阶梯度优化,但仍可参与联合优化任务。本文开创性地研究混合去中心化优化,探讨零阶与一阶优化能力节点共存于分布式系统中,并尝试基于特定数据分布联合求解优化问题的场景。我们证明,在合理参数设置下,此类系统不仅能容忍噪声更强的零阶智能体,甚至可通过整合此类智能体而非忽略其信息来优化整体过程。我们方法的核心在于对含噪声及可能带偏差梯度估计器的分布式优化进行创新性分析,该分析本身具有独立研究价值。我们的结论同时适用于凸与非凸目标函数。在标准优化任务上的实验结果验证了理论分析,表明混合一阶-零阶优化方法具有实用性,即使在深度神经网络训练中亦能有效应用。