重新思考愿景转变者的空间层面 (Rethinking Spatial Dimensions of Vision Transformers) - 专知论文

会员服务 ·

0

Vision · 变换 · Principle · 卷积神经网络 · Extensibility ·

2021 年 3 月 30 日

Rethinking Spatial Dimensions of Vision Transformers

翻译：重新思考愿景转变者的空间层面

Byeongho Heo,Sangdoo Yun,Dongyoon Han,Sanghyuk Chun,Junsuk Choe,Seong Joon Oh

from arxiv, 10 pages, 5 figures

Vision Transformer (ViT) extends the application range of transformers from language processing to computer vision tasks as being an alternative architecture against the existing convolutional neural networks (CNN). Since the transformer-based architecture has been innovative for computer vision modeling, the design convention towards an effective architecture has been less studied yet. From the successful design principles of CNN, we investigate the role of the spatial dimension conversion and its effectiveness on the transformer-based architecture. We particularly attend the dimension reduction principle of CNNs; as the depth increases, a conventional CNN increases channel dimension and decreases spatial dimensions. We empirically show that such a spatial dimension reduction is beneficial to a transformer architecture as well, and propose a novel Pooling-based Vision Transformer (PiT) upon the original ViT model. We show that PiT achieves the improved model capability and generalization performance against ViT. Throughout the extensive experiments, we further show PiT outperforms the baseline on several tasks such as image classification, object detection and robustness evaluation. Source codes and ImageNet models are available at https://github.com/naver-ai/pit

翻译：视觉变异器(VIT)将变压器的应用范围从语言处理扩大到计算机的变压器应用范围,将变压器的应用范围从语言处理扩大到计算机的视觉任务,作为针对现有变动神经网络(CNN)的替代结构。由于以变压器为基础的结构对于计算机的视觉建模来说是创新的,因此对有效结构的设计公约的研究较少。从CNN的成功设计原则来看,我们调查空间维度转换的作用及其对变压器结构在变压器结构中的有效性。我们特别参与了CNN的降低维度原则;随着深度的提高,常规CNN的CNN频道增加了频道的维度,降低了空间维度。我们从经验上表明,这种空间尺寸的减压也有益于变压器结构,并在原VT模型上提出了一个新的基于集合的视觉变压器(PiT)。我们显示,PiT在广泛的实验中,在图像分类、物体探测和坚固度评估等几项任务上,比基准的基线还要高。源码和图像网络模型见https://github.com-naver-ai/pit/pit/pit/pital/pit

0

相关内容

Vision

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

279+阅读 · 2020年11月26日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

17+阅读 · 2020年11月17日

【ST2020硬核课】深度神经网络，57页ppt

【ST2020硬核课】深度神经网络，57页ppt

专知会员服务

43+阅读 · 2020年8月19日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

119+阅读 · 2020年5月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

116+阅读 · 2020年2月3日

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

专知会员服务

13+阅读 · 2019年12月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

166+阅读 · 2019年10月13日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

轻量attention模块：Spatial Group-wise Enhance

轻量attention模块：Spatial Group-wise Enhance

极市平台

15+阅读 · 2019年7月3日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

一文纵览 Vision-and-Language 领域最新研究与进展

一文纵览 Vision-and-Language 领域最新研究与进展

AI科技评论

7+阅读 · 2019年5月14日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

注意力机制(Attention)最新综述论文及相关源码

注意力机制(Attention)最新综述论文及相关源码

人工智能学家

30+阅读 · 2018年11月17日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇目标检测相关论文——显著目标检测、弱监督One-Shot检测、多框检测器、携带物体检测、假彩色图像检测

【论文推荐】最新5篇目标检测相关论文——显著目标检测、弱监督One-Shot检测、多框检测器、携带物体检测、假彩色图像检测

专知

74+阅读 · 2018年1月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

10+阅读 · 2015年7月1日

Synthesizer: Rethinking Self-Attention in Transformer Models

Synthesizer: Rethinking Self-Attention in Transformer Models

Arxiv

0+阅读 · 2021年5月24日

Rethinking the Design Principles of Robust Vision Transformer

Arxiv

0+阅读 · 2021年5月23日

Intriguing Properties of Vision Transformers

Arxiv

0+阅读 · 2021年5月21日

Content-Augmented Feature Pyramid Network with Light Linear Transformers

Arxiv

0+阅读 · 2021年5月20日

SparseBERT: Rethinking the Importance Analysis in Self-attention

SparseBERT: Rethinking the Importance Analysis in Self-attention

Arxiv

7+阅读 · 2021年2月25日

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Arxiv

10+阅读 · 2020年12月31日

Rethinking Attention with Performers

Arxiv

3+阅读 · 2020年9月30日

Doubly Attentive Transformer Machine Translation

Doubly Attentive Transformer Machine Translation

Arxiv

4+阅读 · 2018年7月30日

Scaling Neural Machine Translation

Arxiv

3+阅读 · 2018年6月1日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

VIP会员

文章信息

相关主题

卷积神经网络

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

279+阅读 · 2020年11月26日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

17+阅读 · 2020年11月17日

【ST2020硬核课】深度神经网络，57页ppt

【ST2020硬核课】深度神经网络，57页ppt

专知会员服务

43+阅读 · 2020年8月19日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

119+阅读 · 2020年5月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

116+阅读 · 2020年2月3日

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

专知会员服务

13+阅读 · 2019年12月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

166+阅读 · 2019年10月13日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

热门VIP内容

相关资讯

轻量attention模块：Spatial Group-wise Enhance

轻量attention模块：Spatial Group-wise Enhance

极市平台

15+阅读 · 2019年7月3日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

一文纵览 Vision-and-Language 领域最新研究与进展

一文纵览 Vision-and-Language 领域最新研究与进展

AI科技评论

7+阅读 · 2019年5月14日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

注意力机制(Attention)最新综述论文及相关源码

注意力机制(Attention)最新综述论文及相关源码

人工智能学家

30+阅读 · 2018年11月17日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇目标检测相关论文——显著目标检测、弱监督One-Shot检测、多框检测器、携带物体检测、假彩色图像检测

【论文推荐】最新5篇目标检测相关论文——显著目标检测、弱监督One-Shot检测、多框检测器、携带物体检测、假彩色图像检测

专知

74+阅读 · 2018年1月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

10+阅读 · 2015年7月1日

相关论文

Synthesizer: Rethinking Self-Attention in Transformer Models

Synthesizer: Rethinking Self-Attention in Transformer Models

Arxiv

0+阅读 · 2021年5月24日

Rethinking the Design Principles of Robust Vision Transformer

Arxiv

0+阅读 · 2021年5月23日

Intriguing Properties of Vision Transformers

Arxiv

0+阅读 · 2021年5月21日

Content-Augmented Feature Pyramid Network with Light Linear Transformers

Arxiv

0+阅读 · 2021年5月20日

SparseBERT: Rethinking the Importance Analysis in Self-attention

SparseBERT: Rethinking the Importance Analysis in Self-attention

Arxiv

7+阅读 · 2021年2月25日

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Arxiv

10+阅读 · 2020年12月31日

Rethinking Attention with Performers

Arxiv

3+阅读 · 2020年9月30日

Doubly Attentive Transformer Machine Translation

Doubly Attentive Transformer Machine Translation

Arxiv

4+阅读 · 2018年7月30日

Scaling Neural Machine Translation

Arxiv

3+阅读 · 2018年6月1日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

微信扫码咨询专知VIP会员