TokonMix: 重新思考图像混合, 以在视野变形器中增加数据 (TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers) - 专知论文

会员服务 ·

0

Vision · Performer · 变换 · 数据增强 · 模型评估 ·

2022 年 9 月 12 日

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers

翻译：TokonMix: 重新思考图像混合, 以在视野变形器中增加数据

Jihao Liu,Boxiao Liu,Hang Zhou,Hongsheng Li,Yu Liu

from arxiv, ECCV 2022; Code: https://github.com/Sense-X/TokenMix

CutMix is a popular augmentation technique commonly used for training modern convolutional and transformer vision networks. It was originally designed to encourage Convolution Neural Networks (CNNs) to focus more on an image's global context instead of local information, which greatly improves the performance of CNNs. However, we found it to have limited benefits for transformer-based architectures that naturally have a global receptive field. In this paper, we propose a novel data augmentation technique TokenMix to improve the performance of vision transformers. TokenMix mixes two images at token level via partitioning the mixing region into multiple separated parts. Besides, we show that the mixed learning target in CutMix, a linear combination of a pair of the ground truth labels, might be inaccurate and sometimes counter-intuitive. To obtain a more suitable target, we propose to assign the target score according to the content-based neural activation maps of the two images from a pre-trained teacher model, which does not need to have high performance. With plenty of experiments on various vision transformer architectures, we show that our proposed TokenMix helps vision transformers focus on the foreground area to infer the classes and enhances their robustness to occlusion, with consistent performance gains. Notably, we improve DeiT-T/S/B with +1% ImageNet top-1 accuracy. Besides, TokenMix enjoys longer training, which achieves 81.2% top-1 accuracy on ImageNet with DeiT-S trained for 400 epochs. Code is available at https://github.com/Sense-X/TokenMix.

翻译：CutMix 是一种广受欢迎的增强技术,通常用于培训现代变压器和变压器的视觉网络。最初设计它的目的是鼓励 Convolution NealNets(CNNs) 更多地关注图像的全球背景,而不是本地信息,从而大大改善CNN的性能。然而,我们发现它对于基于变压器的架构的效益有限,这种结构自然具有全球可接受域。在本文中,我们提议一种新型的数据增强技术 TokenMix 来改进视觉变异器的性能。 TokenMix 通过将混合区域分成多个分离部分,在象征性层面混合了两个图像。此外,我们展示了 CutMix 的混合学习目标,即一对地面真相标签的线性组合,可能是不准确的,有时是反直觉的。为了获得更合适的目标,我们提议根据基于内容的以网络变色图绘制目标评分数,这是不需要高性能的。 TokenMix 在各种视觉变形结构上进行大量实验,我们展示了托肯-Mix 更清晰度的图像变现器,从而提升了我们所学的视野变现的图像的视野变校正的成绩。

0

相关内容

Vision

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

基于电子倍增CCD的计算鬼成像方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

视网膜年龄相关性黄斑病变OCT图像的三维分割算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于能带剪裁的单极型HgCdTe高温红外探测器结构研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多模态医学影像技术的急性视网膜动脉阻塞病变自动诊断与分析

国家自然科学基金

1+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

Reg3b调控胰岛β细胞再生的作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于拉曼光谱的牛ICSI和IVF早期胚胎差异研究

国家自然科学基金

0+阅读 · 2012年12月31日

中文语境下基于模糊本体的用户在线评论的情感分析

国家自然科学基金

0+阅读 · 2009年12月31日

免标记型多通道阵列免疫传感器检测多组分肿瘤标志物

国家自然科学基金

0+阅读 · 2009年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

Boosting vision transformers for image retrieval

Arxiv

0+阅读 · 2022年10月21日

Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection

Arxiv

0+阅读 · 2022年10月21日

Unsupervised Medical Image Translation with Adversarial Diffusion Models

Arxiv

0+阅读 · 2022年10月21日

Rethinking Transfer Learning for Medical Image Classification

Arxiv

0+阅读 · 2022年10月20日

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

Arxiv

0+阅读 · 2022年10月20日

The Devil in Linear Transformer

Arxiv

0+阅读 · 2022年10月19日

Intra-Source Style Augmentation for Improved Domain Generalization

Arxiv

0+阅读 · 2022年10月18日

Transformers in Medical Image Analysis: A Review

Transformers in Medical Image Analysis: A Review

Arxiv

40+阅读 · 2022年2月24日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Boosting vision transformers for image retrieval

Arxiv

0+阅读 · 2022年10月21日

Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection

Arxiv

0+阅读 · 2022年10月21日

Unsupervised Medical Image Translation with Adversarial Diffusion Models

Arxiv

0+阅读 · 2022年10月21日

Rethinking Transfer Learning for Medical Image Classification

Arxiv

0+阅读 · 2022年10月20日

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

Arxiv

0+阅读 · 2022年10月20日

The Devil in Linear Transformer

Arxiv

0+阅读 · 2022年10月19日

Intra-Source Style Augmentation for Improved Domain Generalization

Arxiv

0+阅读 · 2022年10月18日

Transformers in Medical Image Analysis: A Review

Transformers in Medical Image Analysis: A Review

Arxiv

40+阅读 · 2022年2月24日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

相关基金

基于电子倍增CCD的计算鬼成像方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

视网膜年龄相关性黄斑病变OCT图像的三维分割算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于能带剪裁的单极型HgCdTe高温红外探测器结构研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多模态医学影像技术的急性视网膜动脉阻塞病变自动诊断与分析

国家自然科学基金

1+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

Reg3b调控胰岛β细胞再生的作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于拉曼光谱的牛ICSI和IVF早期胚胎差异研究

国家自然科学基金

0+阅读 · 2012年12月31日

中文语境下基于模糊本体的用户在线评论的情感分析

国家自然科学基金

0+阅读 · 2009年12月31日

免标记型多通道阵列免疫传感器检测多组分肿瘤标志物

国家自然科学基金

0+阅读 · 2009年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员