Self-supervised learning (SSL) has emerged as a powerful paradigm for learning representations without labeled data, often by enforcing invariance to input transformations such as rotations or blurring. Recent studies have highlighted two pivotal properties for effective representations: (i) avoiding dimensional collapse-where the learned features occupy only a low-dimensional subspace, and (ii) enhancing uniformity of the induced distribution. In this work, we introduce T-REGS, a simple regularization framework for SSL based on the length of the Minimum Spanning Tree (MST) over the learned representation. We provide theoretical analysis demonstrating that T-REGS simultaneously mitigates dimensional collapse and promotes distribution uniformity on arbitrary compact Riemannian manifolds. Several experiments on synthetic data and on classical SSL benchmarks validate the effectiveness of our approach at enhancing representation quality.
翻译:自监督学习(SSL)已成为一种无需标注数据即可学习表征的强大范式,其通常通过强制对输入变换(如旋转或模糊)的不变性来实现。近期研究强调了有效表征的两个关键特性:(i)避免维度坍缩——即学习到的特征仅占据一个低维子空间;(ii)提升诱导分布的均匀性。本文中,我们提出了T-REGS,一种基于学习表征上最小生成树(MST)长度的简单SSL正则化框架。我们通过理论分析证明,T-REGS能够在任意紧致黎曼流形上同时缓解维度坍缩并促进分布均匀性。在合成数据及经典SSL基准测试上的多项实验验证了我们的方法在提升表征质量方面的有效性。