Clustering is a fundamental problem in many areas, which aims to partition a given data set into groups based on some distance measure, such that the data points in the same group are similar while that in different groups are dissimilar. Due to its importance and NP-hardness, a lot of methods have been proposed, among which evolutionary algorithms are a class of popular ones. Evolutionary clustering has found many successful applications, but all the results are empirical, lacking theoretical support. This paper fills this gap by proving that the approximation performance of the GSEMO (a simple multi-objective evolutionary algorithm) for solving the three popular formulations of clustering, i.e., $k$-center, $k$-median and $k$-means, can be theoretically guaranteed. Furthermore, we prove that evolutionary clustering can have theoretical guarantees even when considering fairness, which tries to avoid algorithmic bias, and has recently been an important research topic in machine learning.
翻译:集群是许多领域的一个基本问题,它旨在将特定数据集分成基于某种距离测量的组群,因此同一组群中的数据点相似,而不同组群中的数据点则不同。由于其重要性和NP-硬性,已经提出了许多方法,其中进化算法是一种受欢迎的方法。进化算法已经发现许多成功的应用,但所有结果都是经验性的,缺乏理论支持。本文填补了这一空白,证明GMEMO(一种简单的多目标进化算法)的近似性能(一种简单的多目标进化算法)对于解决三种流行的组合组合式(即$-center、$-k$-medigen和$k$-meat-meat- means)可以理论上加以保证。此外,我们证明进化组合即使考虑到公平性,也可以有理论上的保证,试图避免算法上的偏差,而且最近已经成为机器学习的一个重要研究课题。