记忆增强语言培训模式 (Training Language Models with Memory Augmentation) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 可约的 · Perplexity · 讲稿 ·

2022 年 11 月 29 日

Training Language Models with Memory Augmentation

翻译：记忆增强语言培训模式

Zexuan Zhong,Tao Lei,Danqi Chen

from arxiv, EMNLP 2022. Our code and models are available at https://github.com/princeton-nlp/TRIME

Recent work has improved language models (LMs) remarkably by equipping them with a non-parametric memory component. However, most existing approaches only introduce mem-ories at testing time or represent them using a separately trained encoder, resulting in suboptimal training of the language model. In this work, we present TRIME, a novel yet simple training approach designed for training LMs with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We also present new methods for memory construction and data batching, which are used for adapting to different sets of memories--local, long-term, and external memory--at testing time. We evaluate TRIME on multiple language modeling and machine translation benchmarks and show that it is able to achieve significant improvements across all the settings. Concretely, TRIME reduces the perplexity from 18.70 to 15.37 on WIKITEXT-103, by effectively leveraging a large memory set from the training corpus. Compared to standard LM training, TRIME adds negligible computational overhead and is compatible with different neural architectures, making it a versatile solution for training memory-augmented LMs.

翻译：最近的工作明显改进了语言模型,为这些模型配备了非参数内存部分,从而显著改进了语言模型(LMS),然而,大多数现有方法只是在测试时引入内聚物,或使用单独培训的编码器代表它们,从而导致语言模型的培训不尽善。在这项工作中,我们介绍了TRIME,这是一个创新的、简单的培训方法,旨在用记忆扩增来培训LMS。我们的方法使用的培训目标直接将内存示例作为可获取的记忆。我们还介绍了记忆构建和数据分类的新方法,用于适应不同系列的记忆-局部、长期和外部记忆-测试时间。我们用多种语言模型和机器翻译基准对TRIME进行评估,并表明它能够在所有环境中实现重大改进。具体地说,TRIME通过有效地利用从培训堆中获取的大型记忆集集,将TRIME从18.70到15.37的混淆从WIKITEXT-103上减少。与标准的LM培训相比,TRME增加了可忽略的计算间接费用,并且与不同的神经结构相容不小,使它成为培训的一个可变式的解决方案。

0

相关内容

语言模型化

语言模型化

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

线粒体TRAP1分子介导Ago2蛋白表达在肠癌转移中的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

多功能无机杂化功能分子的制备与性能

国家自然科学基金

0+阅读 · 2014年12月31日

飞秒、纳米时/空尺度热输运机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向前列腺癌介孔纳米硅球MR显像

国家自然科学基金

0+阅读 · 2013年12月31日

氯氧化铋量子点薄膜的制备及其光催化机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

小分子蛋白Thioredoxin -1在脑缺血再灌注损伤中的保护作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

半Heusler合金型拓扑绝缘体材料的制备和物性研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

去乙酰化转移酶（HDAC)抑制剂MS-275对胃癌细胞的选择性杀伤作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

Crawling the Internal Knowledge-Base of Language Models

Arxiv

0+阅读 · 2023年1月30日

Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning

Arxiv

0+阅读 · 2023年1月30日

Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning

Arxiv

0+阅读 · 2023年1月27日

Accelerating Domain-aware Deep Learning Models with Distributed Training

Arxiv

0+阅读 · 2023年1月25日

Data Augmentation Approaches in Natural Language Processing: A Survey

Arxiv

18+阅读 · 2021年10月5日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新型数字杀伤链：理解综合战术网络对野战炮兵体系的能力与效益

《对抗环境中运用数字孪生技术优化预测性维护与后勤保障》2025最新93页

《任务式指挥十六个案例研究》232页

《幻觉还是事实：国防大型语言模型的可信度评估研究》2025最新109页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

相关论文

Crawling the Internal Knowledge-Base of Language Models

Arxiv

0+阅读 · 2023年1月30日

Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning

Arxiv

0+阅读 · 2023年1月30日

Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning

Arxiv

0+阅读 · 2023年1月27日

Accelerating Domain-aware Deep Learning Models with Distributed Training

Arxiv

0+阅读 · 2023年1月25日

Data Augmentation Approaches in Natural Language Processing: A Survey

Arxiv

18+阅读 · 2021年10月5日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

相关基金

线粒体TRAP1分子介导Ago2蛋白表达在肠癌转移中的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

多功能无机杂化功能分子的制备与性能

国家自然科学基金

0+阅读 · 2014年12月31日

飞秒、纳米时/空尺度热输运机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向前列腺癌介孔纳米硅球MR显像

国家自然科学基金

0+阅读 · 2013年12月31日

氯氧化铋量子点薄膜的制备及其光催化机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

小分子蛋白Thioredoxin -1在脑缺血再灌注损伤中的保护作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

半Heusler合金型拓扑绝缘体材料的制备和物性研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

去乙酰化转移酶（HDAC)抑制剂MS-275对胃癌细胞的选择性杀伤作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员