分配平均估计值和减少差异的新界值 (New Bounds For Distributed Mean Estimation and Variance Reduction)

We consider the problem of distributed mean estimation (DME), in which $n$ machines are each given a local $d$-dimensional vector $x_v \in \mathbb{R}^d$, and must cooperate to estimate the mean of their inputs $\mu = \frac 1n\sum_{v = 1}^n x_v$, while minimizing total communication cost. DME is a fundamental construct in distributed machine learning, and there has been considerable work on variants of this problem, especially in the context of distributed variance reduction for stochastic gradients in parallel SGD. Previous work typically assumes an upper bound on the norm of the input vectors, and achieves an error bound in terms of this norm. However, in many real applications, the input vectors are concentrated around the correct output $\mu$, but $\mu$ itself has large norm. In such cases, previous output error bounds perform poorly. In this paper, we show that output error bounds need not depend on input norm. We provide a method of quantization which allows distributed mean estimation to be performed with solution quality dependent only on the distance between inputs, not on input norm, and show an analogous result for distributed variance reduction. The technique is based on a new connection with lattice theory. We also provide lower bounds showing that the communication to error trade-off of our algorithms is asymptotically optimal. As the lattices achieving optimal bounds under $\ell_2$-norm can be computationally impractical, we also present an extension which leverages easy-to-use cubic lattices, and is loose only up to a logarithmic factor in $d$. We show experimentally that our method yields practical improvements for common applications, relative to prior approaches.

翻译：我们考虑的是分配平均估计(DME)的问题,在分配平均估计(DME)中,每台机器的美元值被给当地美元维量矢量 $x_v = omathbb{R ⁇ d$ = mathb{R ⁇ d$),并且必须合作估算其输入值的平均值$mu=\ frac 1n\ sum ⁇ v= 1\\n x_v$,同时将通信总成本降到最低。DME是分配机器学习中的一个基本构造,在这个问题的变异方面做了大量的工作,特别是在对平行 SGD 的随机梯度梯度分配差异减少的情况下。以往的工作通常对输入量矢量标准值的上限值进行上限值限制,并在此标准范围内实现错误的错误。但是在许多实际应用中,输入矢量集中围绕正确的产出值进行,但美元值本身是很大的标准。在这种情况下,以前的输出误差可能不取决于输入规范。我们提供了一种夸度方法,即可以进行平均估算,在计算结果中进行平均质量的计算, 也显示在前的顺序输入结果。