通过潜在空间的对比损失最大限度地提高相同数据样本的不同扩充视图之间的一致性来学习表示。对比式自监督学习技术是一类很有前途的方法,它通过学习编码来构建表征,编码使两个事物相似或不同

VIP内容

自监督学习由于能够避免标注大规模数据集的成本而受到欢迎。它能够采用自定义的伪标签作为监督,并将学习到的表示用于几个下游任务。具体来说,对比学习最近已成为计算机视觉、自然语言处理(NLP)等领域的自主监督学习方法的主要组成部分。它的目的是将同一个样本的增广版本嵌入到一起,同时试图将不同样本中的嵌入推开。这篇论文提供了一个广泛的自我监督的方法综述,遵循对比的方法。本研究解释了在对比学习设置中常用的借口任务,以及到目前为止提出的不同架构。接下来,我们将对图像分类、目标检测和动作识别等多个下游任务的不同方法进行性能比较。最后,我们总结了目前方法的局限性和需要进一步的技术和未来方向取得实质性进展。

https://arxiv.org/abs/2011.00362

概述:

随着深度学习技术的发展,它已成为目前大多数智能系统的核心组件之一。深度神经网络(DNNs)能够从现有的大量数据中学习丰富的模式,这使得它在大多数计算机视觉(CV)任务(如图像分类、目标检测、图像分割、动作识别)以及自然语言处理(NLP)任务(如句子分类、语言模型、机器翻译等)中成为一种引人注目的方法。然而,由于手工标注数百万个数据样本的工作量很大,从标记数据中学习特征的监督方法已经几乎达到了饱和。这是因为大多数现代计算机视觉系统(受监督的)都试图通过查找大型数据集中数据点及其各自注释之间的模式来学习某种形式的图像表示。像GRAD-CAM[1]这样的工作提出了一种技术,可以为模型所做的决策提供可视化的解释,从而使决策更加透明和可解释。

传统的监督学习方法很大程度上依赖于可用的带注释的训练数据的数量。尽管有大量的可用数据,但缺乏注解促使研究人员寻找替代方法来利用它们。这就是自监督方法在推动深度学习的进程中发挥重要作用的地方,它不需要昂贵的标注,也不需要学习数据本身提供监督的特征表示。

监督学习不仅依赖昂贵的注释,而且还会遇到泛化错误、虚假的相关性和对抗攻击[2]等问题。最近,自监督学习方法集成了生成和对比方法,这些方法能够利用未标记的数据来学习潜在的表示。一种流行的方法是提出各种各样的代理任务,利用伪标签来帮助学习特征。诸如图像inpainting、灰度图像着色、拼图游戏、超分辨率、视频帧预测、视听对应等任务已被证明是学习良好表示的有效方法。

生成式模型在2014年引入生成对抗网络(GANs)[3]后得到普及。这项工作后来成为许多成功架构的基础,如CycleGAN[4]、StyleGAN[5]、PixelRNN[6]、Text2Image[7]、DiscoGAN [8]等。这些方法激发了更多的研究人员转向使用无标签数据在自监督的设置下训练深度学习模型。尽管取得了成功,研究人员开始意识到基于GAN的方法的一些并发症。它们很难训练,主要有两个原因: (a)不收敛——模型参数发散很多,很少收敛; (b)鉴别器太过成功,导致生成网络无法产生类似真实的假信号,导致学习无法继续。此外,生成器和判别器之间需要适当的同步,以防止判别器收敛和生成器发散。

成为VIP会员查看完整内容
0
19

热门内容

This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.

0
13
下载
预览

最新内容

To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method that directly works at the level of pixels (or local features) by taking into account the correspondence between local features. We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin. Specifically, over the strong MoCo-v2 baseline, our method achieves significant improvements of 2.0% AP on PASCAL VOC object detection, 1.1% AP on COCO object detection, 0.9% AP on COCO instance segmentation, 3.0% mIoU on PASCAL VOC semantic segmentation and 1.8% mIoU on Cityscapes semantic segmentation. Code is available at: https://git.io/AdelaiDet

0
0
下载
预览

最新论文

To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method that directly works at the level of pixels (or local features) by taking into account the correspondence between local features. We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin. Specifically, over the strong MoCo-v2 baseline, our method achieves significant improvements of 2.0% AP on PASCAL VOC object detection, 1.1% AP on COCO object detection, 0.9% AP on COCO instance segmentation, 3.0% mIoU on PASCAL VOC semantic segmentation and 1.8% mIoU on Cityscapes semantic segmentation. Code is available at: https://git.io/AdelaiDet

0
0
下载
预览
Top