This work aims to reproduce results from the CVPR 2020 paper by Gidaris et al. Self-supervised learning (SSL) is used to learn feature representations of an image using an unlabeled dataset. This work proposes to use bag-of-words (BoW) deep feature descriptors as a self-supervised learning target to learn robust, deep representations. BowNet is trained to reconstruct the histogram of visual words (ie. the deep BoW descriptor) of a reference image when presented a perturbed version of the image as input. Thus, this method aims to learn perturbation-invariant and context-aware image features that can be useful for few-shot tasks or supervised downstream tasks. In the paper, the author describes BowNet as a network consisting of a convolutional feature extractor $\Phi(\cdot)$ and a Dense-softmax layer $\Omega(\cdot)$ trained to predict BoW features from images. After BoW training, the features of $\Phi$ are used in downstream tasks. For this challenge we were trying to build and train a network that could reproduce the CIFAR-100 accuracy improvements reported in the original paper. However, we were unsuccessful in reproducing an accuracy improvement comparable to what the authors mentioned.
翻译:这项工作旨在复制Gidaris等人的CVPR 2020年论文的结果。 自我监督的学习(SSL) 用于用未贴标签的数据集学习图像的特征表现。 这项工作提议使用一包字(BoW)深处的描述符作为自我监督的学习目标,以学习强健、深处的演示。 BowNet 接受培训,在将图像的图象(即深处的BOW描述符)作为输入来显示一个参考图像的直观图( 即深处的BOW描述符) 。 因此, 这种方法的目的是学习对少发任务或下游任务有用的图象的特征。 在论文中, 作者描述BowNet是一个由革命性特征提取器 $\ Phi (\ cdot) 组成的网络, 并培训Dense- socmax 层 $\\ Omega (cdot) 用于从图像中预测 BoW的特征。 因此, $\ Phile 的特征在下游任务中使用了$\ Phile 这样的特征特征特征特征。 我们正尝试在复制过程中, 建立一个可比较的RFAR 。