演化数据的标签一致性聚类 (Label-consistent clustering for evolving data)

Data analysis often involves an iterative process, where solutions must be continuously refined in response to new data. Typically, as new data becomes available, an existing solution must be updated to incorporate the latest information. In addition to seeking a high-quality solution for the task at hand, it is also crucial to ensure consistency by minimizing drastic changes from previous solutions. Applying this approach across many iterations, ensures that the solution evolves gradually and smoothly. In this paper, we study the above problem in the context of clustering, specifically focusing on the $k$-center problem. More precisely, we study the following problem: Given a set of points $X$, parameters $k$ and $b$, and a prior clustering solution $H$ for $X$, our goal is to compute a new solution $C$ for $X$, consisting of $k$ centers, which minimizes the clustering cost while introducing at most $b$ changes from $H$. We refer to this problem as label-consistent $k$-center, and we propose two constant-factor approximation algorithms for it. We complement our theoretical findings with an experimental evaluation demonstrating the effectiveness of our methods on real-world datasets.

翻译：数据分析通常涉及一个迭代过程，在此过程中，必须根据新数据不断优化解决方案。通常，随着新数据的出现，现有解决方案需要更新以整合最新信息。除了为当前任务寻求高质量解决方案外，确保一致性也至关重要，即最小化与先前解决方案的剧烈变动。通过在多轮迭代中应用此方法，可确保解决方案逐步平稳地演化。本文在聚类背景下研究上述问题，特别聚焦于$k$-中心问题。具体而言，我们研究以下问题：给定点集$X$、参数$k$和$b$，以及$X$的先前聚类解$H$，我们的目标是计算$X$的新解$C$，该解由$k$个中心组成，在最小化聚类成本的同时，相较于$H$最多引入$b$处变动。我们将此问题称为标签一致性$k$-中心问题，并为之提出两种常数因子近似算法。我们通过实验评估验证了所提方法在真实数据集上的有效性，从而补充了理论研究成果。

相关内容