快速和轻量级Transformer的实用调查 (A Practical Survey on Faster and Lighter Transformers) - 专知论文

会员服务 ·

0

序列 · Transformer · 低复杂度 · 存储 · 多序列 ·

2023 年 3 月 27 日

A Practical Survey on Faster and Lighter Transformers

翻译：快速和轻量级Transformer的实用调查

Quentin Fournier,Gaétan Marceau Caron,Daniel Aloise

from arxiv, ACM Computing Surveys; 40 pages, 18 figures, 4 tables

Recurrent neural networks are effective models to process sequences. However, they are unable to learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, a model solely based on the attention mechanism that is able to relate any two positions of the input sequence, hence modelling arbitrary long dependencies. The Transformer has improved the state-of-the-art across numerous sequence modelling tasks. However, its effectiveness comes at the expense of a quadratic computational and memory complexity with respect to the sequence length, hindering its adoption. Fortunately, the deep learning community has always been interested in improving the models' efficiency, leading to a plethora of solutions such as parameter sharing, pruning, mixed-precision, and knowledge distillation. Recently, researchers have directly addressed the Transformer's limitation by designing lower-complexity alternatives such as the Longformer, Reformer, Linformer, and Performer. However, due to the wide range of solutions, it has become challenging for researchers and practitioners to determine which methods to apply in practice in order to meet the desired trade-off between capacity, computation, and memory. This survey addresses this issue by investigating popular approaches to make Transformers faster and lighter and by providing a comprehensive explanation of the methods' strengths, limitations, and underlying assumptions.

翻译：循环神经网络是处理序列的有效模型。然而，由于其固有的顺序特性，它们无法学习长期依赖性。作为解决方案，Vaswani等人引入了Transformer，这是一种仅基于注意机制的模型，能够关联输入序列的任意两个位置，因此模拟任意长的依赖关系。Transformer已经改进了许多序列建模任务的最新技术。然而，其有效性是以与序列长度成二次计算和存储复杂度为代价的，这阻碍了其采用。幸运的是，深度学习社区一直致力于提高模型效率，导致了大量的解决方案，如参数共享、修剪、混合精度和知识蒸馏。最近，研究人员通过设计低复杂度的替代方法（如Longformer、Reformer、Linformer和Performer）直接解决了Transformer的限制。然而，由于解决方案种类繁多，研究人员和从业者很难确定要在实践中应用哪些方法，以满足所需的容量、计算和存储之间的权衡。本调查通过调查使Transformer更快并变轻的流行方法，并全面解释方法的优点、局限性和基本假设，以解决这个问题。

0

相关内容

数学上，序列是被排成一列的对象（或事件）；这样每个元素不是在其他元素之前，就是在其他元素之后。这里，元素之间的顺序非常重要。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

【图神经网络实用介绍】A practical introduction to GNNs - Part 1

【图神经网络实用介绍】A practical introduction to GNNs - Part 1

专知会员服务

28+阅读 · 2022年2月28日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Python最佳实践、技巧与提示30则】《30 Python Best Practices, Tips, And Tricks》by Erik-Jan van Baaren

【Python最佳实践、技巧与提示30则】《30 Python Best Practices, Tips, And Tricks》by Erik-Jan van Baaren

专知会员服务

35+阅读 · 2020年1月6日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

肺内皮细胞S1PR1受体在流感病毒所致ARDS中的作用

国家自然科学基金

1+阅读 · 2014年12月31日

肾小球内皮细胞表达的swiprosin-1在早期糖尿病肾病发病中的作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

多尺度电磁场问题的高效数值方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

腺苷A2A受体在心肌肥厚中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于内嵌金属富勒烯的有机太阳能电池受体材料的研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型计算环境下的排序问题

国家自然科学基金

0+阅读 · 2012年12月31日

动脉粥样硬化中法尼酯X受体下调巨噬细胞中肝脂酶表达的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子discord及其在量子计算中的研究

国家自然科学基金

1+阅读 · 2011年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

HyHTM: Hyperbolic Geometry based Hierarchical Topic Models

Arxiv

0+阅读 · 2023年5月16日

A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions

Arxiv

16+阅读 · 2023年2月9日

Efficient Transformers: A Survey

Arxiv

35+阅读 · 2022年3月14日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

A Survey of Transformers

Arxiv

103+阅读 · 2021年6月8日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

【图神经网络实用介绍】A practical introduction to GNNs - Part 1

【图神经网络实用介绍】A practical introduction to GNNs - Part 1

专知会员服务

28+阅读 · 2022年2月28日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Python最佳实践、技巧与提示30则】《30 Python Best Practices, Tips, And Tricks》by Erik-Jan van Baaren

【Python最佳实践、技巧与提示30则】《30 Python Best Practices, Tips, And Tricks》by Erik-Jan van Baaren

专知会员服务

35+阅读 · 2020年1月6日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】计算受限的持续学习：基础与算法

生成式人工智能时代的多目标推荐：最新进展与未来展望综述

AI大模型技术在电力系统中的应用及发展趋势

【ICML2025】SparseLoRA：利用上下文稀疏性加速大语言模型微调

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

HyHTM: Hyperbolic Geometry based Hierarchical Topic Models

Arxiv

0+阅读 · 2023年5月16日

A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions

Arxiv

16+阅读 · 2023年2月9日

Efficient Transformers: A Survey

Arxiv

35+阅读 · 2022年3月14日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

A Survey of Transformers

Arxiv

103+阅读 · 2021年6月8日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

相关基金

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

肺内皮细胞S1PR1受体在流感病毒所致ARDS中的作用

国家自然科学基金

1+阅读 · 2014年12月31日

肾小球内皮细胞表达的swiprosin-1在早期糖尿病肾病发病中的作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

多尺度电磁场问题的高效数值方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

腺苷A2A受体在心肌肥厚中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于内嵌金属富勒烯的有机太阳能电池受体材料的研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型计算环境下的排序问题

国家自然科学基金

0+阅读 · 2012年12月31日

动脉粥样硬化中法尼酯X受体下调巨噬细胞中肝脂酶表达的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子discord及其在量子计算中的研究

国家自然科学基金

1+阅读 · 2011年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员