利用小型数据集对视觉变异器进行有效培训 (Efficient Training of Visual Transformers with Small-Size Datasets) - 专知论文

会员服务 ·

0

VTS · Extensibility · 模型评估 · 稳健性 · 变换 ·

2021 年 6 月 7 日

Efficient Training of Visual Transformers with Small-Size Datasets

翻译：利用小型数据集对视觉变异器进行有效培训

Yahui Liu,Enver Sangineto,Wei Bi,Nicu Sebe,Bruno Lepri,Marco De Nadai

Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger representation capacity. However, the lack of the typical convolutional inductive bias makes these models more data-hungry than common CNNs. In fact, some local properties of the visual domain which are embedded in the CNN architectural design, in VTs should be learned from samples. In this paper, we empirically analyse different VTs, comparing their robustness in a small training-set regime, and we show that, despite having a comparable accuracy when trained on ImageNet, their performance on smaller datasets can be largely different. Moreover, we propose a self-supervised task which can extract additional information from images with only a negligible computational overhead. This task encourages the VTs to learn spatial relations within an image and makes the VT training much more robust when training data are scarce. Our task is used jointly with the standard (supervised) training and it does not depend on specific architectural choices, thus it can be easily plugged in the existing VTs. Using an extensive evaluation with different VTs and datasets, we show that our method can improve (sometimes dramatically) the final accuracy of the VTs. The code will be available upon acceptance.

翻译：视觉变异器(VTs)正在作为革命网络(CNNs)的建筑范式。与CNN不同, VTs可以捕捉图像元素之间的全球关系,而且它们可能具有更大的代表性。但是,由于典型的进化偏差的典型特征,这些模型比普通CNN(CNN)更多的数据饥饿。事实上,在CNN(VTs)的建筑设计中嵌入的视觉域的一些本地属性应该从样本中学习。在本文中,我们实验性地分析不同的VT,比较其在小型培训系统中的强健性,并且我们表明,尽管在图像网培训时,它们具有可比的准确性,但其在较小数据集上的性能却可能大不相同。此外,我们提议了一种自我监督的任务,它可以从图像中提取额外的信息,只有微不足道的计算间接费用。这项任务鼓励VTs从一个图像中学习空间关系,并在培训数据稀少时使VT培训更加稳健。我们的任务与标准( 超视) 培训同时使用,并不取决于具体的建筑选择,因此, 其性能轻易地在现有的VT中显示我们最终的准确度。

1

相关内容

VTS

VTS：VLSI Test Symposium Explanation：超大规模集成电路测试研讨会。 Publisher：IEEE。 SIT： http://dblp.uni-trier.de/db/conf/vts/

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

29+阅读 · 2021年7月30日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

65+阅读 · 2020年7月25日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

36+阅读 · 2020年3月27日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

24+阅读 · 2020年3月19日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

49+阅读 · 2020年3月7日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

42+阅读 · 2020年1月28日

【佐治亚理工学院】大规模的视觉对话的预训练:一个简单的最先进的基线（Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline）

【佐治亚理工学院】大规模的视觉对话的预训练:一个简单的最先进的基线（Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline）

专知会员服务

4+阅读 · 2019年12月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

30+阅读 · 2019年10月17日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

已删除

将门创投

4+阅读 · 2019年4月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

14+阅读 · 2018年5月29日

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

Arxiv

0+阅读 · 2021年7月30日

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

Arxiv

0+阅读 · 2021年7月30日

AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models

Arxiv

1+阅读 · 2021年7月29日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Arxiv

6+阅读 · 2020年10月26日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

11+阅读 · 2020年6月23日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Quadruplet Network with One-Shot Learning for Fast Visual Object Tracking

Arxiv

9+阅读 · 2018年3月17日

Depth-Adaptive Computational Policies for Efficient Visual Tracking

Arxiv

8+阅读 · 2018年1月1日

VIP会员

文章信息

相关主题

相关VIP内容

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

29+阅读 · 2021年7月30日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

65+阅读 · 2020年7月25日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

36+阅读 · 2020年3月27日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

24+阅读 · 2020年3月19日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

49+阅读 · 2020年3月7日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

42+阅读 · 2020年1月28日

【佐治亚理工学院】大规模的视觉对话的预训练:一个简单的最先进的基线（Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline）

【佐治亚理工学院】大规模的视觉对话的预训练:一个简单的最先进的基线（Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline）

专知会员服务

4+阅读 · 2019年12月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

30+阅读 · 2019年10月17日

热门VIP内容

相关资讯

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

已删除

将门创投

4+阅读 · 2019年4月1日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

14+阅读 · 2018年5月29日

相关论文

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

Arxiv

0+阅读 · 2021年7月30日

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

Arxiv

0+阅读 · 2021年7月30日

AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models

Arxiv

1+阅读 · 2021年7月29日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Arxiv

6+阅读 · 2020年10月26日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

11+阅读 · 2020年6月23日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Quadruplet Network with One-Shot Learning for Fast Visual Object Tracking

Arxiv

9+阅读 · 2018年3月17日

Depth-Adaptive Computational Policies for Efficient Visual Tracking

Arxiv

8+阅读 · 2018年1月1日

微信扫码咨询专知VIP会员