通过估算双级分发双级分发的自监测学习 (Self-Supervised Learning by Estimating Twin Class Distributions)

We present TWIST, a simple and theoretically explainable self-supervised representation learning method by classifying large-scale unlabeled datasets in an end-to-end way. We employ a siamese network terminated by a softmax operation to produce twin class distributions of two augmented images. Without supervision, we enforce the class distributions of different augmentations to be consistent. However, simply minimizing the divergence between augmentations will cause collapsed solutions, i.e., outputting the same class probability distribution for all images. In this case, no information about the input image is left. To solve this problem, we propose to maximize the mutual information between the input and the class predictions. Specifically, we minimize the entropy of the distribution for each sample to make the class prediction for each sample assertive and maximize the entropy of the mean distribution to make the predictions of different samples diverse. In this way, TWIST can naturally avoid the collapsed solutions without specific designs such as asymmetric network, stop-gradient operation, or momentum encoder. As a result, TWIST outperforms state-of-the-art methods on a wide range of tasks. Especially, TWIST performs surprisingly well on semi-supervised learning, achieving 61.2% top-1 accuracy with 1% ImageNet labels using a ResNet-50 as backbone, surpassing previous best results by an absolute improvement of 6.2%. Codes and pre-trained models are given on: https://github.com/bytedance/TWIST

翻译：我们提出了一个简单和理论上可解释的自我监督的代表学习方法TWIST, 这是一种简单和理论上可以自我监督的自我监督学习方法, 以端到端的方式对大型无标签的数据集进行分类。我们使用一个由软式操作终止的硅状网络, 以产生双类分发两个增强的图像。没有监督, 我们强制执行不同增强的类分布, 以便保持一致性。然而, 仅仅将增强之间的差异最小化就会导致解决方案崩溃, 即输出所有图像的同一类概率分布。在这种情况下, 没有留下关于输入图像的信息。为了解决这个问题, 我们建议最大限度地扩大输入和类预测之间的相互信息。我们使用一个软式的网络网络, 以最小化的方式对每个样本进行分类预测, 并尽可能扩大平均分配的元素分布, 以使不同的样本的预测多样化。这样, 技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、技术、