By leveraging contrastive learning, clustering, and other pretext tasks, unsupervised methods for learning image representations have reached impressive results on standard benchmarks. The result has been a crowded field - many methods with substantially different implementations yield results that seem nearly identical on popular benchmarks, such as linear evaluation on ImageNet. However, a single result does not tell the whole story. In this paper, we compare methods using performance-based benchmarks such as linear evaluation, nearest neighbor classification, and clustering for several different datasets, demonstrating the lack of a clear front-runner within the current state-of-the-art. In contrast to prior work that performs only supervised vs. unsupervised comparison, we compare several different unsupervised methods against each other. To enrich this comparison, we analyze embeddings with measurements such as uniformity, tolerance, and centered kernel alignment (CKA), and propose two new metrics of our own: nearest neighbor graph similarity and linear prediction overlap. We reveal through our analysis that in isolation, single popular methods should not be treated as though they represent the field as a whole, and that future work ought to consider how to leverage the complimentary nature of these methods. We also leverage CKA to provide a framework to robustly quantify augmentation invariance, and provide a reminder that certain types of invariance will be undesirable for downstream tasks.
翻译:通过利用对比性学习、集群和其他托辞任务,未经监督的学习图像展示方法在标准基准上取得了令人印象深刻的成果。结果是一个拥挤的实地,许多执行方式大相径庭,结果在流行基准上几乎完全相同,例如图像网的线性评价。然而,一个结果并不能说明整个故事。在本文中,我们比较了使用基于业绩的基准的方法,例如线性评价、近邻分类和若干不同数据集的组合,表明在目前的最新技术中缺乏明确的前沿管理者。与以往只进行监督性比较和无监督性比较的工作相比,我们比较了几种不同的非监督性方法。为了丰富这一比较,我们分析了与统一性、容忍性和核心内核调整(CKA)等测量结果的结合,并提出了我们自己的两个新的衡量标准:最近的近邻图相似性和线性预测重叠。我们通过分析发现,在孤立性中,单一流行方法不应被视为代表整个实地,而未来的工作应当考虑如何利用某种不同的非监督性方法。我们还要考虑如何利用稳定性框架的升级性地利用稳定性框架。