豺狼社会结构：迈向更大的卷积语言模型 (Hyena Hierarchy: Towards Larger Convolutional Language Models) - 专知论文

会员服务 ·

0

序列 · 结构 · 卷积 · 操作 · Transformer ·

2023 年 4 月 19 日

Hyena Hierarchy: Towards Larger Convolutional Language Models

翻译：豺狼社会结构：迈向更大的卷积语言模型

Michael Poli,Stefano Massaroli,Eric Nguyen,Daniel Y. Fu,Tri Dao,Stephen Baccus,Yoshua Bengio,Stefano Ermon,Christopher Ré

from arxiv, Additional details

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting the amount of context accessible. Existing subquadratic methods based on low-rank and sparse approximations need to be combined with dense attention layers to match Transformers, indicating a gap in capability. In this work, we propose Hyena, a subquadratic drop-in replacement for attention constructed by interleaving implicitly parametrized long convolutions and data-controlled gating. In recall and reasoning tasks on sequences of thousands to hundreds of thousands of tokens, Hyena improves accuracy by more than 50 points over operators relying on state-spaces and other implicit and explicit methods, matching attention-based models. We set a new state-of-the-art for dense-attention-free architectures on language modeling in standard datasets (WikiText103 and The Pile), reaching Transformer quality with a 20% reduction in training compute required at sequence length 2K. Hyena operators are twice as fast as highly optimized attention at sequence length 8K, and 100x faster at sequence length 64K.

翻译：近年来，深度学习取得了许多进展，主要依赖大规模Transformer实现的学习能力。然而，Transformer的核心构造块注意力操作符具有与序列长度二次方成正比的计算量，限制了可访问的上下文范围。现有的基于低秩和稀疏逼近的次二次方法需要与密集注意力层相结合，以匹配Transformer，这表明存在能力差距。在本文中，我们提出了豺狼社会结构（Hyena），它是一个次二次的注意力替代品，通过交错隐式参数化的长卷积和数据控制门的方式构建。在对数千到数十万个令牌的序列进行回归和推理任务时，豺狼社会结构在准确性方面比依赖状态空间和其他隐式和显式方法的操作符提高了50多个分数，而且与基于注意力的模型相当。在标准数据集（WikiText103和The Pile）上，我们在不需要密集注意力的架构上创造了新的最先进技术水平，在序列长度为2K时，训练计算量减少了20％，达到了Transformer的质量。在序列长度为8K时，豺狼社会结构操作速度是高度优化的注意力的两倍，在序列长度为64K时是后者的100倍。

1

相关内容

数学上，序列是被排成一列的对象（或事件）；这样每个元素不是在其他元素之前，就是在其他元素之后。这里，元素之间的顺序非常重要。

UIUC-Gargi《增强型语言模型》，64页ppt与视频

UIUC-Gargi《增强型语言模型》，64页ppt与视频

专知会员服务

37+阅读 · 2023年5月12日

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

77+阅读 · 2022年3月15日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

95+阅读 · 2020年4月18日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

BERT进展2019四篇必读论文

BERT进展2019四篇必读论文

专知会员服务

68+阅读 · 2020年1月2日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

ERNIE Tutorial（论文笔记 + 实践指南）

ERNIE Tutorial（论文笔记 + 实践指南）

AINLP

30+阅读 · 2019年8月28日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

1+阅读 · 2017年12月31日

基于深层神经网络的多模态快速稀疏表征器

国家自然科学基金

3+阅读 · 2014年12月31日

支持QoS的多核异步处理器体系结构研究

国家自然科学基金

0+阅读 · 2013年12月31日

适用于无线传感器网络SOC的低功耗低成本SAR型A/D转换器设计研究

国家自然科学基金

0+阅读 · 2013年12月31日

HuR蛋白介导的人类mRNA出核调控过程的结构生物学研究

国家自然科学基金

0+阅读 · 2012年12月31日

HIC1调控CIITA转录机制研究及其在B细胞分化中的意义

国家自然科学基金

0+阅读 · 2012年12月31日

可压缩流动的二阶精度大时间步长、高分辨率差分格式研究及其验证

国家自然科学基金

0+阅读 · 2012年12月31日

基于视皮层感知机制的生物启发运动特征层次化模型

国家自然科学基金

0+阅读 · 2011年12月31日

抑癌基因ARHI与孤儿受体TR3相互结合及在胃癌发生发展中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

基于Al结的超导量子比特的制备和宏观量子性质研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Watermark for Large Language Models

Arxiv

0+阅读 · 2023年6月6日

Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks

Arxiv

0+阅读 · 2023年6月4日

Independent Modular Networks

Arxiv

0+阅读 · 2023年6月2日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

453+阅读 · 2023年3月31日

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Arxiv

11+阅读 · 2023年3月5日

Hierarchical Graph Capsule Network

Hierarchical Graph Capsule Network

Arxiv

20+阅读 · 2020年12月16日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Arxiv

14+阅读 · 2019年3月5日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

VIP会员

文章信息

相关主题

相关VIP内容

UIUC-Gargi《增强型语言模型》，64页ppt与视频

UIUC-Gargi《增强型语言模型》，64页ppt与视频

专知会员服务

37+阅读 · 2023年5月12日

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

77+阅读 · 2022年3月15日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

95+阅读 · 2020年4月18日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

BERT进展2019四篇必读论文

BERT进展2019四篇必读论文

专知会员服务

68+阅读 · 2020年1月2日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

中文版 | 俄罗斯人工智能、战场自主化与战术核武器的融合

《量子云系统安全漏洞：新兴威胁综述》最新综述

中文版 | 特种作战部队新装备

《无人海洋载具发展综述：智能化与协同化》35页

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

ERNIE Tutorial（论文笔记 + 实践指南）

ERNIE Tutorial（论文笔记 + 实践指南）

AINLP

30+阅读 · 2019年8月28日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

A Watermark for Large Language Models

Arxiv

0+阅读 · 2023年6月6日

Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks

Arxiv

0+阅读 · 2023年6月4日

Independent Modular Networks

Arxiv

0+阅读 · 2023年6月2日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

453+阅读 · 2023年3月31日

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Arxiv

11+阅读 · 2023年3月5日

Hierarchical Graph Capsule Network

Hierarchical Graph Capsule Network

Arxiv

20+阅读 · 2020年12月16日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Arxiv

14+阅读 · 2019年3月5日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

相关基金

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

1+阅读 · 2017年12月31日

基于深层神经网络的多模态快速稀疏表征器

国家自然科学基金

3+阅读 · 2014年12月31日

支持QoS的多核异步处理器体系结构研究

国家自然科学基金

0+阅读 · 2013年12月31日

适用于无线传感器网络SOC的低功耗低成本SAR型A/D转换器设计研究

国家自然科学基金

0+阅读 · 2013年12月31日

HuR蛋白介导的人类mRNA出核调控过程的结构生物学研究

国家自然科学基金

0+阅读 · 2012年12月31日

HIC1调控CIITA转录机制研究及其在B细胞分化中的意义

国家自然科学基金

0+阅读 · 2012年12月31日

可压缩流动的二阶精度大时间步长、高分辨率差分格式研究及其验证

国家自然科学基金

0+阅读 · 2012年12月31日

基于视皮层感知机制的生物启发运动特征层次化模型

国家自然科学基金

0+阅读 · 2011年12月31日

抑癌基因ARHI与孤儿受体TR3相互结合及在胃癌发生发展中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

基于Al结的超导量子比特的制备和宏观量子性质研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员