短列、试验长列列列列列:注意线性线性线性线性线性线性线性线性线性线性线性线性线性使输入长度外推法 (Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation) - 专知论文

会员服务 ·

0

有偏 · 位置嵌入 · 线性的 · 注意力机制 · MoDELS ·

2022 年 4 月 22 日

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

翻译：短列、试验长列列列列列:注意线性线性线性线性线性线性线性线性线性线性线性线性线性使输入长度外推法

Ofir Press,Noah A. Smith,Mike Lewis

Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training? We first show that extrapolation can be enabled by simply changing the position representation method, though we find that current methods do not allow for efficient extrapolation. We therefore introduce a simpler and more efficient position method, Attention with Linear Biases (ALiBi). ALiBi does not add positional embeddings to word embeddings; instead, it biases query-key attention scores with a penalty that is proportional to their distance. We show that this method trains a 1.3 billion parameter model on input sequences of length 1024 that extrapolates to input sequences of length 2048, achieving the same perplexity as a sinusoidal position embedding model trained on inputs of length 2048 but training 11% faster and using 11% less memory. ALiBi's inductive bias towards recency also leads it to outperform multiple strong position methods on the WikiText-103 benchmark.

翻译：自Vaswani等人(2017年)引入变压器模型以来,一个根本问题尚未回答:一个模型如何在比培训时更长的序列的推论时间得出外推法?我们首先显示,只要改变位置代表法,就可以进行外推法,尽管我们发现,目前的方法不允许有效外推法。因此,我们引入了一个更简单、更高效的定位方法,即用Linear Biases(ALiBi)来关注定位嵌入;ALiBi没有在单词嵌入中添加定位嵌入;相反,它偏向调心引力计分数,其惩罚与其距离成正比。我们显示,该方法在长度为1024的输入序列上培养了13亿个参数模型,该输入序列外推到长度为2048年的输入序列上,实现与在2048年输入线上训练的正值模型相同的曲解性,但培训速度要快11%,记忆要少11%。 ALiBi的感应偏向适应性也使其超越WikText-103基准上的多重强定位方法。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

74+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

240+阅读 · 2020年4月19日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

169+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

26+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Omi/HtrA2在运动性骨骼肌损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Corin介导的ANP活化在动脉粥样硬化形成及其炎症反应中的作用与机制

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

几个非线性Schrodinger方程组模型及相关问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

球孢链霉菌转录调节基因atrA的多效性调控机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

多巴胺受体对α/β1-、AT1受体抑制作用在高血压病发生中的作用和机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

CNTF激活的Ast与神经元间的对话交流在癫痫发病机制中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

晶界内耗的影响因素和微观机制的进一步研究

国家自然科学基金

0+阅读 · 2008年12月31日

Accurate and efficient Simulation of very high-dimensional Neural Mass Models with distributed-delay Connectome Tensors

Arxiv

0+阅读 · 2022年6月10日

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

Arxiv

0+阅读 · 2022年6月7日

Lottery Tickets with Nonzero Biases

Arxiv

0+阅读 · 2022年6月7日

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

Arxiv

0+阅读 · 2022年6月6日

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Arxiv

12+阅读 · 2020年4月15日

Efficiently Embedding Dynamic Knowledge Graphs

Efficiently Embedding Dynamic Knowledge Graphs

Arxiv

14+阅读 · 2019年10月15日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

Bilinear Attention Networks

Arxiv

11+阅读 · 2018年5月21日

Self-Attention with Relative Position Representations

Arxiv

27+阅读 · 2018年4月12日

Convolutional 2D Knowledge Graph Embeddings

Arxiv

28+阅读 · 2018年4月6日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

74+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

240+阅读 · 2020年4月19日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

169+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

热门VIP内容

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

26+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Accurate and efficient Simulation of very high-dimensional Neural Mass Models with distributed-delay Connectome Tensors

Arxiv

0+阅读 · 2022年6月10日

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

Arxiv

0+阅读 · 2022年6月7日

Lottery Tickets with Nonzero Biases

Arxiv

0+阅读 · 2022年6月7日

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

Arxiv

0+阅读 · 2022年6月6日

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Arxiv

12+阅读 · 2020年4月15日

Efficiently Embedding Dynamic Knowledge Graphs

Efficiently Embedding Dynamic Knowledge Graphs

Arxiv

14+阅读 · 2019年10月15日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

Bilinear Attention Networks

Arxiv

11+阅读 · 2018年5月21日

Self-Attention with Relative Position Representations

Arxiv

27+阅读 · 2018年4月12日

Convolutional 2D Knowledge Graph Embeddings

Arxiv

28+阅读 · 2018年4月6日

相关基金

Omi/HtrA2在运动性骨骼肌损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Corin介导的ANP活化在动脉粥样硬化形成及其炎症反应中的作用与机制

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

几个非线性Schrodinger方程组模型及相关问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

球孢链霉菌转录调节基因atrA的多效性调控机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

多巴胺受体对α/β1-、AT1受体抑制作用在高血压病发生中的作用和机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

CNTF激活的Ast与神经元间的对话交流在癫痫发病机制中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

晶界内耗的影响因素和微观机制的进一步研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员