In this paper, we design replicable algorithms in the context of statistical clustering under the recently introduced notion of replicability. A clustering algorithm is replicable if, with high probability, it outputs the exact same clusters after two executions with datasets drawn from the same distribution when its internal randomness is shared across the executions. We propose such algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their combinatorial counterparts in a black-box manner. In particular, we demonstrate a replicable $O(1)$-approximation algorithm for statistical Euclidean $k$-medians ($k$-means) with $\operatorname{poly}(d)$ sample complexity. We also describe a $O(1)$-approximation algorithm with an additional $O(1)$-additive error for statistical Euclidean $k$-centers, albeit with $\exp(d)$ sample complexity.
翻译:在本文中,我们根据最近引入的可复制性概念,在统计组群中设计可复制的算法。如果在两次处决后,如果在处决期间内部随机性共享时,用同一分布的数据集,在两次处决后产生完全相同的组群,那么组合算法是可以复制的。我们为统计中标、统计中标、统计中标美元和统计中标提出这种算法问题,方法是以黑盒方式对其组合对应方使用近似例。特别是,我们展示了统计Eucliidean $-k$ 中间值(k$-poly}(d)美元)的可复制的合比值算法,而样本复杂度为$\peratorname{poly}(d)美元。我们还用额外的美元(1)美元-额外误差来描述统计Euclidean $k-centrents,尽管样本复杂度为$\ expl(d)美元。