压缩器: 高效低射速超复合适应层 (Compacter: Efficient Low-Rank Hypercomplex Adapter Layers) - 专知论文

会员服务 ·

0

Performer · Weight · MoDELS · 语言模型化 · 层 ·

2021 年 6 月 8 日

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

翻译：压缩器: 高效低射速超复合适应层

Rabeeh Karimi Mahabadi,James Henderson,Sebastian Ruder

Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning. In this work, we propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work. Compacter accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers. Specifically, Compacter inserts task-specific weight matrices into a pretrained model's weights, which are computed efficiently as a sum of Kronecker products between shared ``slow'' weights and ``fast'' rank-one matrices defined per Compacter layer. By only training 0.047% of a pretrained model's parameters, Compacter performs on par with standard fine-tuning on GLUE and outperforms fine-tuning in low-resource settings. Our code is publicly available in https://github.com/rabeehk/compacter/

翻译：通过微调使大规模预先培训的语言模型适应下游任务,是达到NLP基准最新业绩的标准方法。然而,微调模型的所有重量,加上数百万或数十亿参数,抽样效率低,低资源环境不稳定,浪费性,因为它需要为每项任务储存一个单独的模型副本。最近的工作已经开发出具有参数效率的微调方法,但这些方法仍需要数量相对较多的参数或不完善的标准微调。在这项工作中,我们提议Claimer,这是对大型语言模型进行微调的一种方法,在任务业绩和可培训参数数目之间作出更好的权衡。在适应者、低级别优化和参数化超复杂化的多变层中,对模型进行抽样化。具体地说,Claimer将特定任务重量矩阵插入一个经过预先训练的模型重量中,在共享的'sslow'重量和`fast'级标准/Siral-one 矩阵中,在任务性调整前的常规结构中,只对标准/Grassimer 进行升级的测试。

2

相关内容

Performer

SIGIR2021接受论文列表公布！151篇论文都在这了！

专知会员服务

37+阅读 · 2021年4月27日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

106+阅读 · 2020年5月15日

【ICLR2020-哥伦比亚大学】多关系图神经网络CompGCN

【ICLR2020-哥伦比亚大学】多关系图神经网络CompGCN

专知会员服务

49+阅读 · 2020年4月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

CVPR 2020 论文开源项目合集

专知会员服务

109+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

30+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

91+阅读 · 2019年10月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Bite-Weight Estimation Using Commercial Ear Buds

Arxiv

0+阅读 · 2021年8月2日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Arxiv

8+阅读 · 2021年5月5日

Zero-shot Adversarial Quantization

Arxiv

6+阅读 · 2021年3月30日

LogME: Practical Assessment of Pre-trained Models for Transfer Learning

Arxiv

4+阅读 · 2021年2月22日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Improved Deep Embeddings for Inferencing with Multi-Layered Networks

Improved Deep Embeddings for Inferencing with Multi-Layered Networks

Arxiv

3+阅读 · 2019年3月1日

Quantizing deep convolutional networks for efficient inference: A whitepaper

Quantizing deep convolutional networks for efficient inference: A whitepaper

Arxiv

6+阅读 · 2018年6月21日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

SIGIR2021接受论文列表公布！151篇论文都在这了！

专知会员服务

37+阅读 · 2021年4月27日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

106+阅读 · 2020年5月15日

【ICLR2020-哥伦比亚大学】多关系图神经网络CompGCN

【ICLR2020-哥伦比亚大学】多关系图神经网络CompGCN

专知会员服务

49+阅读 · 2020年4月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

CVPR 2020 论文开源项目合集

专知会员服务

109+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

30+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

91+阅读 · 2019年10月16日

热门VIP内容

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Bite-Weight Estimation Using Commercial Ear Buds

Arxiv

0+阅读 · 2021年8月2日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Arxiv

8+阅读 · 2021年5月5日

Zero-shot Adversarial Quantization

Arxiv

6+阅读 · 2021年3月30日

LogME: Practical Assessment of Pre-trained Models for Transfer Learning

Arxiv

4+阅读 · 2021年2月22日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Improved Deep Embeddings for Inferencing with Multi-Layered Networks

Improved Deep Embeddings for Inferencing with Multi-Layered Networks

Arxiv

3+阅读 · 2019年3月1日

Quantizing deep convolutional networks for efficient inference: A whitepaper

Quantizing deep convolutional networks for efficient inference: A whitepaper

Arxiv

6+阅读 · 2018年6月21日

微信扫码咨询专知VIP会员