We revisit the $(f,g)$-clustering problem that we introduced in a recent work [SODA'25], and which subsumes fundamental clustering problems such as $k$-Center, $k$-Median, Min-Sum of Radii, and Min-Load $k$-Clustering. This problem assigns each of the $k$ clusters a cost determined by the monotone, symmetric norm $f$ applied to the vector distances in the cluster, and aims at minimizing the norm $g$ applied to the vector of cluster costs. Previously, we focused on certain special cases for which we designed constant-factor approximation algorithms. Our bounds for more general settings left, however, large gaps to the known bounds for the basic problems they capture. In this work, we provide a clearer picture of the approximability of these more general settings. First, we design an $O(\log^2 n)$-approximation algorithm for $(f, L_{1})$-clustering for any $f$. This improves upon our previous $\widetilde{O}(\sqrt{n})$-approximation. Second, we provide an $O(k)$-approximation for the general $(f,g)$-clustering problem, which improves upon our previous $\widetilde{O}(\sqrt{kn})$-approximation algorithm and matches the best-known upper bound for Min-Load $k$-Clustering. We then design an approximation algorithm for $(f,g)$-clustering that interpolates, up to polylog factors, between the best known bounds for $k$-Center, $k$-Median, Min-Sum of Radii, Min-Load $k$-Clustering, (Top, $L_{1}$)-clustering, and $(L_{\infty},g)$-clustering based on a newly defined parameter of $f$ and $g$.
翻译:我们重新审视了在近期工作[SODA'25]中提出的$(f,g)$-聚类问题,该问题涵盖了$k$-中心、$k$-中位数、半径最小和、最小负载$k$-聚类等基础聚类问题。该问题通过单调对称范数$f$作用于簇内距离向量来确定每个$k$簇的成本,并旨在最小化范数$g$作用于簇成本向量后的结果。此前,我们专注于特定特例,并设计了常数因子近似算法。然而,我们对更一般设置的边界估计与它们所涵盖的基本问题的已知边界之间存在较大差距。在本工作中,我们为这些更一般设置的近似性提供了更清晰的图景。首先,我们为任意$f$的$(f, L_{1})$-聚类设计了$O(\log^2 n)$近似算法,这改进了我们之前$\widetilde{O}(\sqrt{n})$的近似结果。其次,我们为一般$(f,g)$-聚类问题提供了$O(k)$近似算法,这改进了我们之前$\widetilde{O}(\sqrt{kn})$的近似算法,并与最小负载$k$-聚类的最佳已知上界相匹配。随后,我们设计了一种$(f,g)$-聚类近似算法,该算法基于新定义的$f$和$g$参数,在对数多项式因子内,插值了$k$-中心、$k$-中位数、半径最小和、最小负载$k$-聚类、(Top, $L_{1}$)-聚类及$(L_{\infty},g)$-聚类的最佳已知边界。