Kernel power $k$-means (KPKM) leverages a family of means to mitigate local minima issues in kernel $k$-means. However, KPKM faces two key limitations: (1) the computational burden of the full kernel matrix restricts its use on extensive data, and (2) the lack of authentic centroid-sample assignment learning reduces its noise robustness. To overcome these challenges, we propose RFF-KPKM, introducing the first approximation theory for applying random Fourier features (RFF) to KPKM. RFF-KPKM employs RFF to generate efficient, low-dimensional feature maps, bypassing the need for the whole kernel matrix. Crucially, we are the first to establish strong theoretical guarantees for this combination: (1) an excess risk bound of $\mathcal{O}(\sqrt{k^3/n})$, (2) strong consistency with membership values, and (3) a $(1+\varepsilon)$ relative error bound achievable using the RFF of dimension $\mathrm{poly}(\varepsilon^{-1}\log k)$. Furthermore, to improve robustness and the ability to learn multiple kernels, we propose IP-RFF-MKPKM, an improved possibilistic RFF-based multiple kernel power $k$-means. IP-RFF-MKPKM ensures the scalability of MKPKM via RFF and refines cluster assignments by combining the merits of the possibilistic membership and fuzzy membership. Experiments on large-scale datasets demonstrate the superior efficiency and clustering accuracy of the proposed methods compared to the state-of-the-art alternatives.
翻译:核幂K均值(KPKM)通过引入均值族来缓解核K均值中的局部极小值问题。然而,KPKM存在两个关键局限:(1)完整核矩阵的计算负担限制了其在大规模数据上的应用;(2)缺乏真实的质心-样本分配学习机制降低了其对噪声的鲁棒性。为克服这些挑战,我们提出RFF-KPKM方法,首次建立了将随机傅里叶特征(RFF)应用于KPKM的近似理论。RFF-KPKM利用RFF生成高效的低维特征映射,从而避免使用完整核矩阵。关键的是,我们首次为该组合建立了严格的理论保证:(1)超额风险界为$\mathcal{O}(\sqrt{k^3/n})$,(2)与隶属度值的强一致性,以及(3)通过维度为$\mathrm{poly}(\varepsilon^{-1}\log k)$的RFF可实现$(1+\varepsilon)$相对误差界。此外,为提升鲁棒性与多核学习能力,我们提出IP-RFF-MKPKM——一种改进的基于可能性RFF的多核幂K均值方法。该方法通过RFF确保MKPKM的可扩展性,并融合可能性隶属度与模糊隶属度的优势以优化聚类分配。在大规模数据集上的实验表明,相较于现有先进方法,所提方法具有更优的效率和聚类精度。