Characterizing Verbatim Short-Term Memory in Neural Language Models - 专知论文

会员服务 ·

0

语言模型化 · 长短期记忆网络 · MoDELS · 神经语言模型 · 变换 ·

2023 年 5 月 1 日

Characterizing Verbatim Short-Term Memory in Neural Language Models

翻译：暂无翻译

Kristijan Armeni,Christopher Honey,Tal Linzen

from arxiv, V2 corrects an issue with tokenization for one of the models (Wikitext-103 transformer). The relevant figures and the accompanying text were updated. This update does not affect conclusions which remain the same as in previous version

When a language model is trained to predict natural language sequences, its prediction at each moment depends on a representation of prior context. What kind of information about the prior context can language models retrieve? We tested whether language models could retrieve the exact words that occurred previously in a text. In our paradigm, language models (transformers and an LSTM) processed English text in which a list of nouns occurred twice. We operationalized retrieval as the reduction in surprisal from the first to the second list. We found that the transformers retrieved both the identity and ordering of nouns from the first list. Further, the transformers' retrieval was markedly enhanced when they were trained on a larger corpus and with greater model depth. Lastly, their ability to index prior tokens was dependent on learned attention patterns. In contrast, the LSTM exhibited less precise retrieval, which was limited to list-initial tokens and to short intervening texts. The LSTM's retrieval was not sensitive to the order of nouns and it improved when the list was semantically coherent. We conclude that transformers implemented something akin to a working memory system that could flexibly retrieve individual token representations across arbitrary delays; conversely, the LSTM maintained a coarser and more rapidly-decaying semantic gist of prior tokens, weighted toward the earliest items.

翻译：暂无翻译

0

相关内容

语言模型化

语言模型化

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

二氧化碳加氢合成甲酸纳米金催化剂的构建

国家自然科学基金

0+阅读 · 2016年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

PVT1在胰腺癌化疗耐药中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

P53蛋白调节mTOR信号通路诱导胰腺癌吉西他滨耐药的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Neural World Models for Computer Vision

Arxiv

0+阅读 · 2023年6月15日

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Arxiv

0+阅读 · 2023年6月13日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Arxiv

10+阅读 · 2022年2月10日

Subgraph Neural Networks

Arxiv

27+阅读 · 2020年6月19日

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Arxiv

14+阅读 · 2020年3月10日

VIP会员

文章信息

相关主题

语言模型化

长短期记忆网络

神经语言模型

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用人工智能对军事行动进行建模》

《利用人工智能学习、优化与推演美国海军作战部队的战略布局与分散（续文）》

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

《指挥官意图消息中关键概念自动提取》最新47页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Neural World Models for Computer Vision

Arxiv

0+阅读 · 2023年6月15日

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Arxiv

0+阅读 · 2023年6月13日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Arxiv

10+阅读 · 2022年2月10日

Subgraph Neural Networks

Arxiv

27+阅读 · 2020年6月19日

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Arxiv

14+阅读 · 2020年3月10日

相关基金

二氧化碳加氢合成甲酸纳米金催化剂的构建

国家自然科学基金

0+阅读 · 2016年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

PVT1在胰腺癌化疗耐药中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

P53蛋白调节mTOR信号通路诱导胰腺癌吉西他滨耐药的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员