对贪婪的k- means++进行近乎严密的分析 (A Nearly Tight Analysis of Greedy k-means++) - 专知论文

会员服务 ·

0

Analysis · 贪心 · SODA · JMLR · 样本 ·

2022 年 7 月 16 日

A Nearly Tight Analysis of Greedy k-means++

翻译：对贪婪的k- means++进行近乎严密的分析

Christoph Grunau,Ahmet Alper Özüdoğru,Václav Rozhoň,Jakub Tětek

The famous $k$-means++ algorithm of Arthur and Vassilvitskii [SODA 2007] is the most popular way of solving the $k$-means problem in practice. The algorithm is very simple: it samples the first center uniformly at random and each of the following $k-1$ centers is then always sampled proportional to its squared distance to the closest center so far. Afterward, Lloyd's iterative algorithm is run. The $k$-means++ algorithm is known to return a $\Theta(\log k)$ approximate solution in expectation. In their seminal work, Arthur and Vassilvitskii [SODA 2007] asked about the guarantees for its following \emph{greedy} variant: in every step, we sample $\ell$ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how $k$-means++ is implemented in e.g. the popular Scikit-learn library [Pedregosa et al.; JMLR 2011]. We present nearly matching lower and upper bounds for the greedy $k$-means++: We prove that it is an $O(\ell^3 \log^3 k)$-approximation algorithm. On the other hand, we prove a lower bound of $\Omega(\ell^3 \log^3 k / \log^2(\ell\log k))$. Previously, only an $\Omega(\ell \log k)$ lower bound was known [Bhattacharya, Eube, R\"oglin, Schmidt; ESA 2020] and there was no known upper bound.

翻译：Arthur 和 Vassilvitskii 的著名 $k美元++ 运算法 [SODA 2007] 是实际中解决 $k美元资产问题最受欢迎的方法。该算法非常简单: 它以随机方式对第一个中心进行统一抽样, 以下每个$k-1 中心总是按其与最近中心的平方距离进行抽样。之后, Lloyd 的迭代算法运行。美元++ 的算法已知会返回一个$Theta( log k) 的近似解决方案。在它们的基本工作中, Arthur 和 Vassilvitskii [SODADA 2007] 询问了对其以下的\ emph{greedy} 变量的保障: 每一步, 我们抽样美元候选人中心, 而不是一个最大成本。这也是在例如广受欢迎的 Scikit-learn 图书馆 [Pedregosa et al.

0

相关内容

Analysis

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

71+阅读 · 2022年6月28日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

70+阅读 · 2021年12月8日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

123+阅读 · 2020年9月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

158+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

多元多项式环的Hermite性质与多项式矩阵的分解

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

微分多项式分解的算法和理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

几类无线通信中的非凸矩阵优化问题及算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

随机矩阵理论中Beta系综的特征多项式

国家自然科学基金

0+阅读 · 2013年12月31日

非线性规划的无惩罚型方法及其理论

国家自然科学基金

1+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

特征值优化问题的理论和算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

0+阅读 · 2011年12月31日

Level-strategyproof Belief Aggregation Mechanisms

Level-strategyproof Belief Aggregation Mechanisms

Arxiv

0+阅读 · 2022年9月13日

The Weak Chebyshev Greedy Algorithm (WCGA) in $L^p (\log L)^α$ spaces

Arxiv

0+阅读 · 2022年9月12日

Robust Uncertainty Bounds in Reproducing Kernel Hilbert Spaces: A Convex Optimization Approach

Arxiv

0+阅读 · 2022年9月12日

The Sparsity of LASSO-type Minimizers

Arxiv

0+阅读 · 2022年9月12日

A new method for estimating the tail index using truncated sample sequence

Arxiv

0+阅读 · 2022年9月11日

Deep Learning with Non-Linear Factor Models: Adaptability and Avoidance of Curse of Dimensionality

Arxiv

0+阅读 · 2022年9月9日

Granger Causal Chain Discovery for Sepsis-Associated Derangements via Multivariate Hawkes Processes

Arxiv

0+阅读 · 2022年9月9日

Distributed coloring and the local structure of unit-disk graphs

Distributed coloring and the local structure of unit-disk graphs

Arxiv

0+阅读 · 2022年9月9日

Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion

Arxiv

0+阅读 · 2022年9月9日

Lipschitz (non-)equivalence of the Gromov--Hausdorff distances, including on ultrametric spaces

Arxiv

0+阅读 · 2022年9月9日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

71+阅读 · 2022年6月28日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

70+阅读 · 2021年12月8日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

123+阅读 · 2020年9月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

158+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

热门VIP内容

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

相关论文

Level-strategyproof Belief Aggregation Mechanisms

Level-strategyproof Belief Aggregation Mechanisms

Arxiv

0+阅读 · 2022年9月13日

The Weak Chebyshev Greedy Algorithm (WCGA) in $L^p (\log L)^α$ spaces

Arxiv

0+阅读 · 2022年9月12日

Robust Uncertainty Bounds in Reproducing Kernel Hilbert Spaces: A Convex Optimization Approach

Arxiv

0+阅读 · 2022年9月12日

The Sparsity of LASSO-type Minimizers

Arxiv

0+阅读 · 2022年9月12日

A new method for estimating the tail index using truncated sample sequence

Arxiv

0+阅读 · 2022年9月11日

Deep Learning with Non-Linear Factor Models: Adaptability and Avoidance of Curse of Dimensionality

Arxiv

0+阅读 · 2022年9月9日

Granger Causal Chain Discovery for Sepsis-Associated Derangements via Multivariate Hawkes Processes

Arxiv

0+阅读 · 2022年9月9日

Distributed coloring and the local structure of unit-disk graphs

Distributed coloring and the local structure of unit-disk graphs

Arxiv

0+阅读 · 2022年9月9日

Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion

Arxiv

0+阅读 · 2022年9月9日

Lipschitz (non-)equivalence of the Gromov--Hausdorff distances, including on ultrametric spaces

Arxiv

0+阅读 · 2022年9月9日

相关基金

多元多项式环的Hermite性质与多项式矩阵的分解

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

微分多项式分解的算法和理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

几类无线通信中的非凸矩阵优化问题及算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

随机矩阵理论中Beta系综的特征多项式

国家自然科学基金

0+阅读 · 2013年12月31日

非线性规划的无惩罚型方法及其理论

国家自然科学基金

1+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

特征值优化问题的理论和算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员