Tencent ML-图像:用于视觉演示学习的大型多标签图像数据库 (Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning)

In existing visual representation learning tasks, deep convolutional neural networks (CNNs) are often trained on images annotated with single tags, such as ImageNet. However, a single tag cannot describe all important contents of one image, and some useful visual information may be wasted during training. In this work, we propose to train CNNs from images annotated with multiple tags, to enhance the quality of visual representation of the trained CNN model. To this end, we build a large-scale multi-label image database with 18M images and 11K categories, dubbed Tencent ML-Images. We efficiently train the ResNet-101 model with multi-label outputs on Tencent ML-Images, taking 90 hours for 60 epochs, based on a large-scale distributed deep learning framework,i.e.,TFplus. The good quality of the visual representation of the Tencent ML-Images checkpoint is verified through three transfer learning tasks, including single-label image classification on ImageNet and Caltech-256, object detection on PASCAL VOC 2007, and semantic segmentation on PASCAL VOC 2012. The Tencent ML-Images database, the checkpoints of ResNet-101, and all the training codehave been released at https://github.com/Tencent/tencent-ml-images. It is expected to promote other vision tasks in the research and industry community.

翻译：在现有的视觉代表学习任务中,深层革命神经网络(CNNs)往往在图像上培训,用图像网络等单个标签附加附加说明的图像。然而,单标签无法描述一个图像的所有重要内容,一些有用的视觉信息在培训过程中可能会被浪费。在这项工作中,我们提议对有线电视新闻网进行配有多个标记的图像培训,以提高受过训练的CNN模型的视觉代表质量。为此,我们建立了一个大型多标签图像数据库,有18M图像和11K类,称为Tentcent ML-Images。我们高效率地在Tentent ML-Imagages上用多标签产出来培训ResNet-101模型,在Tentent ML-Imags上用90小时进行多标签产出,在大规模分布式深层学习框架(即TFTFplus)的基础上,为有多个标记的图像网络信息,以提高CNNML-Images检查站的视觉代表的质量。为此,我们通过三个传输学习任务,包括图像网络的单一标签图像分类和Caltech 256,对PC 2007年 PASAL VOC的物体探测任务进行了目标探测,并在PASal-101 Smantennistrual-stal-stational-lagilmreal-stational-stational/Smreal-stations destal delpalmaxilpalpal.

相关内容

表示学习

关注 0

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

36+阅读 · 2020年5月9日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

40+阅读 · 2020年4月11日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

25+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日