Multi-view clustering has been empirically shown to improve learning performance by leveraging the inherent complementary information across multiple views of data. However, in real-world scenarios, collecting strictly aligned views is challenging, and learning from both aligned and unaligned data becomes a more practical solution. Partially View-aligned Clustering aims to learn correspondences between misaligned view samples to better exploit the potential consistency and complementarity across views, including both aligned and unaligned data. However, most existing PVC methods fail to leverage unaligned data to capture the shared semantics among samples from the same cluster. Moreover, the inherent heterogeneity of multi-view data induces distributional shifts in representations, leading to inaccuracies in establishing meaningful correspondences between cross-view latent features and, consequently, impairing learning effectiveness. To address these challenges, we propose a Semantic MAtching contRasTive learning model (SMART) for PVC. The main idea of our approach is to alleviate the influence of cross-view distributional shifts, thereby facilitating semantic matching contrastive learning to fully exploit semantic relationships in both aligned and unaligned data. Extensive experiments on eight benchmark datasets demonstrate that our method consistently outperforms existing approaches on the PVC problem.
翻译:多视图聚类通过利用数据多个视图之间固有的互补信息,已被实证证明能够提升学习性能。然而,在现实场景中,收集严格对齐的视图具有挑战性,而同时利用对齐和未对齐数据进行学习成为一种更实用的解决方案。部分视图对齐聚类旨在学习未对齐视图样本之间的对应关系,以更好地挖掘视图间(包括对齐和未对齐数据)的潜在一致性和互补性。然而,现有的大多数PVC方法未能有效利用未对齐数据来捕捉同一聚类中样本间的共享语义。此外,多视图数据固有的异质性会导致表示分布偏移,从而在建立跨视图潜在特征之间有意义的对应关系时产生不准确性,进而损害学习效果。为应对这些挑战,我们提出了一种用于PVC的语义匹配对比学习模型(SMART)。我们方法的核心思想是减轻跨视图分布偏移的影响,从而促进语义匹配对比学习,以充分利用对齐和未对齐数据中的语义关系。在八个基准数据集上的大量实验表明,我们的方法在PVC问题上持续优于现有方法。