长程语言模型实际使用长程背景吗? (Do Long-Range Language Models Actually Use Long-Range Context?) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · Perplexity · INFORMS · Performer ·

2021 年 9 月 19 日

Do Long-Range Language Models Actually Use Long-Range Context?

翻译：长程语言模型实际使用长程背景吗?

Simeng Sun,Kalpesh Krishna,Andrew Mattarella-Micke,Mohit Iyyer

from arxiv, EMNLP2021

Language models are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions. Recent efforts to improve the efficiency of self-attention have led to a proliferation of long-range Transformer language models, which can process much longer sequences than models of the past. However, the ways in which such models take advantage of the long-range context remain unclear. In this paper, we perform a fine-grained analysis of two long-range Transformer language models (including the \emph{Routing Transformer}, which achieves state-of-the-art perplexity on the PG-19 long-sequence LM benchmark dataset) that accept input sequences of up to 8K tokens. Our results reveal that providing long-range context (i.e., beyond the previous 2K tokens) to these models only improves their predictions on a small set of tokens (e.g., those that can be copied from the distant context) and does not help at all for sentence-level prediction tasks. Finally, we discover that PG-19 contains a variety of different document types and domains, and that long-range context helps most for literary novels (as opposed to textbooks or magazines).

翻译：语言模型通常在短短的输入序列上接受培训,这限制了他们使用长距离范围内的谈话级信息的能力,从而改进预测。最近提高自我关注效率的努力导致长程变换语言模型的扩散,这些模型可以处理比过去模型长得多的顺序。然而,这些模型利用长程背景的方式仍然不明确。在本文件中,我们对两种长程变换语言模型(包括\emph{Routing tranger})进行精细分析,这些模型在PG-19长序列LM基准数据集上达到最先进的迷惑状态,接受最高达8K标记的输入序列。我们的结果显示,这些模型提供长程背景(即超过以前的2K标志)的方式只能改进对一小套符号的预测(例如,可以从远处复制的变换换),并且不会帮助所有层次预测任务。最后,我们发现P-19最远的版本和最新一代的版本的版本的版本(我们发现P-19和最新一代的版本的版本的版本的版本)和最新一代的版本的版本。最后,我们发现P-19的版本的版本的版本和新版本的版本的版本的版本的文件。

0

相关内容

语言模型化

语言模型化

最新「基于Transformer的预训练模型」综述论文，42页pdf304篇文献

最新「基于Transformer的预训练模型」综述论文，42页pdf304篇文献

专知会员服务

102+阅读 · 2021年8月13日

注意力机制综述

注意力机制综述

专知会员服务

80+阅读 · 2021年1月26日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

274+阅读 · 2020年11月26日

【万字长文】注意力机制可解释大论述

专知会员服务

50+阅读 · 2020年11月17日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

131+阅读 · 2020年5月30日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

90+阅读 · 2020年4月18日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

34+阅读 · 2020年3月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

人工智能 | AAAI 2019等国际会议信息7条

人工智能 | AAAI 2019等国际会议信息7条

Call4Papers

4+阅读 · 2018年9月3日

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

专知

10+阅读 · 2018年4月8日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

How Do Neural Sequence Models Generalize? Local and Global Context Cues for Out-of-Distribution Prediction

Arxiv

0+阅读 · 2021年11月4日

Pre-trained Language Models in Biomedical Domain: A Systematic Survey

Arxiv

10+阅读 · 2021年10月12日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

An Attentive Survey of Attention Models

Arxiv

17+阅读 · 2019年4月5日

Context in Neural Machine Translation: A Review of Models and Evaluations

Arxiv

5+阅读 · 2019年1月25日

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Arxiv

4+阅读 · 2019年1月9日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Self-Attention with Relative Position Representations

Arxiv

14+阅读 · 2018年3月6日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

最新「基于Transformer的预训练模型」综述论文，42页pdf304篇文献

最新「基于Transformer的预训练模型」综述论文，42页pdf304篇文献

专知会员服务

102+阅读 · 2021年8月13日

注意力机制综述

注意力机制综述

专知会员服务

80+阅读 · 2021年1月26日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

274+阅读 · 2020年11月26日

【万字长文】注意力机制可解释大论述

专知会员服务

50+阅读 · 2020年11月17日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

131+阅读 · 2020年5月30日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

90+阅读 · 2020年4月18日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

34+阅读 · 2020年3月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

热门VIP内容

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

人工智能 | AAAI 2019等国际会议信息7条

人工智能 | AAAI 2019等国际会议信息7条

Call4Papers

4+阅读 · 2018年9月3日

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

专知

10+阅读 · 2018年4月8日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

How Do Neural Sequence Models Generalize? Local and Global Context Cues for Out-of-Distribution Prediction

Arxiv

0+阅读 · 2021年11月4日

Pre-trained Language Models in Biomedical Domain: A Systematic Survey

Arxiv

10+阅读 · 2021年10月12日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

An Attentive Survey of Attention Models

Arxiv

17+阅读 · 2019年4月5日

Context in Neural Machine Translation: A Review of Models and Evaluations

Arxiv

5+阅读 · 2019年1月25日

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Arxiv

4+阅读 · 2019年1月9日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Self-Attention with Relative Position Representations

Arxiv

14+阅读 · 2018年3月6日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

微信扫码咨询专知VIP会员