切分递归变器 (Sliced Recursive Transformer) - 专知论文

会员服务 ·

0

可约的 · 变换 · Vision · MoDELS · Weight ·

2022 年 7 月 26 日

Sliced Recursive Transformer

翻译：切分递归变器

Zhiqiang Shen,Zechun Liu,Eric Xing

from arxiv, ECCV 2022, 31 pages with Appendix. Code and models are available at https://github.com/szq0214/SReT (v3: update license and fix arxiv timestamp)

We present a neat yet effective recursive operation on vision transformers that can improve parameter utilization without involving additional parameters. This is achieved by sharing weights across the depth of transformer networks. The proposed method can obtain a substantial gain (~2%) simply using naive recursive operation, requires no special or sophisticated knowledge for designing principles of networks, and introduces minimal computational overhead to the training procedure. To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10~30% with minimal performance loss. We call our model Sliced Recursive Transformer (SReT), a novel and parameter-efficient vision transformer design that is compatible with a broad range of other designs for efficient ViT architectures. Our best model establishes significant improvement on ImageNet-1K over state-of-the-art methods while containing fewer parameters. The proposed weight sharing mechanism by sliced recursion structure allows us to build a transformer with more than 100 or even 1000 shared layers with ease while keeping a compact size (13~15M), to avoid optimization difficulties when the model is too large. The flexible scalability has shown great potential for scaling up models and constructing extremely deep vision transformers. Code is available at https://github.com/szq0214/SReT.

翻译：我们展示了对视觉变压器的简单而有效的循环操作,可以提高参数利用率,而不需要额外的参数。这是通过在变压器网络的深度中共享重量,实现的。拟议方法可以使用天真的递转操作获得大量收益(~2%),不需要特殊或尖端的知识来设计网络的原则,并且为培训程序引入了最低计算间接费用。为了减少再循环操作引起的额外计算,同时保持高精度精确度,我们提议了一种近似方法,即通过多个切分组的循环层自我注意,使成本消耗减少10~30 %, 并尽量减少性能损失。我们称之为模型的变压变压器(~2% SReT),这是一个新颖和有参数效率的视觉变压器设计,与高效VIT结构的广泛设计不相容。我们的最佳模型可以大大改进图像网络-1K,而同时减少参数。拟议的重力回压式递合机制使我们能够在100甚至1000个共享的层上构建一个变压器,同时可以轻松地保持一个巨大的变压模型(1315M),在深度的变压模型可以避免巨大的变压模型。

0

相关内容

可约的

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

RnCoX3n+2(R=Y,Sc,Zr,Hf,Sm Pr,Ce等,n=1,2,∞,X=Ga,In)化合物中的新超导体探索

国家自然科学基金

0+阅读 · 2014年12月31日

前缘锯齿对空腔气动噪声的控制机理及参数优化研究

国家自然科学基金

0+阅读 · 2013年12月31日

氧化石墨烯限域水的结构和高压结构相变研究

国家自然科学基金

0+阅读 · 2013年12月31日

Th2细胞分化过程中染色质长距离相互作用的调控功能

国家自然科学基金

0+阅读 · 2012年12月31日

互补阻性开关阵列的非线性动力学理论及电路设计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

60 GHz毫米波通信射频影响消除研究

国家自然科学基金

0+阅读 · 2012年12月31日

用外显子组捕获测序技术鉴定Olmsted型掌跖角化症的致病基因

国家自然科学基金

0+阅读 · 2011年12月31日

重复频率半导体脉冲功率开关RSD的强场效应与关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

面向多核处理器的硬软件协作Transactional Memory系统结构

国家自然科学基金

0+阅读 · 2008年12月31日

Meta-Reinforcement Learning for the Tuning of PI Controllers: An Offline Approach

Arxiv

1+阅读 · 2022年9月19日

Block-Recurrent Transformers

Arxiv

0+阅读 · 2022年9月17日

TransTab: Learning Transferable Tabular Transformers Across Tables

Arxiv

1+阅读 · 2022年9月16日

Self-Attentive Pooling for Efficient Deep Learning

Arxiv

0+阅读 · 2022年9月16日

How to Attack and Defend NextG Radio Access Network Slicing with Reinforcement Learning

Arxiv

0+阅读 · 2022年9月15日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Memory-Gated Recurrent Networks

Memory-Gated Recurrent Networks

Arxiv

12+阅读 · 2020年12月24日

Efficient Transformers: A Survey

Arxiv

23+阅读 · 2020年9月16日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Meta-Reinforcement Learning for the Tuning of PI Controllers: An Offline Approach

Arxiv

1+阅读 · 2022年9月19日

Block-Recurrent Transformers

Arxiv

0+阅读 · 2022年9月17日

TransTab: Learning Transferable Tabular Transformers Across Tables

Arxiv

1+阅读 · 2022年9月16日

Self-Attentive Pooling for Efficient Deep Learning

Arxiv

0+阅读 · 2022年9月16日

How to Attack and Defend NextG Radio Access Network Slicing with Reinforcement Learning

Arxiv

0+阅读 · 2022年9月15日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Memory-Gated Recurrent Networks

Memory-Gated Recurrent Networks

Arxiv

12+阅读 · 2020年12月24日

Efficient Transformers: A Survey

Arxiv

23+阅读 · 2020年9月16日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

相关基金

RnCoX3n+2(R=Y,Sc,Zr,Hf,Sm Pr,Ce等,n=1,2,∞,X=Ga,In)化合物中的新超导体探索

国家自然科学基金

0+阅读 · 2014年12月31日

前缘锯齿对空腔气动噪声的控制机理及参数优化研究

国家自然科学基金

0+阅读 · 2013年12月31日

氧化石墨烯限域水的结构和高压结构相变研究

国家自然科学基金

0+阅读 · 2013年12月31日

Th2细胞分化过程中染色质长距离相互作用的调控功能

国家自然科学基金

0+阅读 · 2012年12月31日

互补阻性开关阵列的非线性动力学理论及电路设计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

60 GHz毫米波通信射频影响消除研究

国家自然科学基金

0+阅读 · 2012年12月31日

用外显子组捕获测序技术鉴定Olmsted型掌跖角化症的致病基因

国家自然科学基金

0+阅读 · 2011年12月31日

重复频率半导体脉冲功率开关RSD的强场效应与关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

面向多核处理器的硬软件协作Transactional Memory系统结构

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员