We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
翻译:我们为不受监督的视觉演示学习提供了动态对比(MoCo) 。 从对比学习的视角看,我们用字典的外观,建立了一个动态字典,配有队列和一个移动平均编码器。这样可以建立一个大而一致的在线词典,便于对比而不受监督的学习。MoCo在图像网络分类的共同线性协议下提供了竞争性结果。更重要的是,MoCo所学的演示非常顺利地转移到了下游任务。MoCo在对PACAL VOC、COCO和其他数据集的7项探测/分类任务中,可以超越其受监督的训练前对应方,有时会大大超过它。这表明许多视觉任务中,未经监督和监督的演示学习之间的差距已经基本缩小。