Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge. Besides, to better retain the performance of the pruned model, the snapshots (i.e., the intermediate models at each pruning iteration) also serve as effective supervisions for pruning. Our extensive experiments show that adopting CAP consistently yields significant improvements, especially in extremely high sparsity scenarios. With only 3% model parameters reserved (i.e., 97% sparsity), CAP successfully achieves 99.2% and 96.3% of the original BERT performance in QQP and MNLI tasks. In addition, our probing experiments demonstrate that the model pruned by CAP tends to achieve better generalization ability.


翻译:培训前语言模型(PLM) 在培训前和微调模式下,在各种自然语言处理(NLP)任务中取得了巨大的成功。 在大量的参数下, PLM 是计算密集型和资源饥饿的。 因此, 引入了模型修剪, 以压缩大型 PLM 。 然而, 大多数前方法都只考虑对下游任务的具体任务知识, 却忽略了修剪过程中的基本任务- 不可知性知识, 这可能导致灾难性的忘记问题, 导致一般化能力差。 为了在预修和微调模式下保持任务、 特定任务的知识, 我们提议在预修的模型中, PLPM3 运行(CAP ) 。 它设计成一个总框架, 与结构化和无结构的裁剪接相兼容。 在对比学习中, CAPM 能够让精练模型从预修过的模式中学习, 仅能导致灾难性的忘性问题, 并精确地调整特定任务的知识模式。 此外, 要更好地保留经修练模型的绩效, 缩模 3 (i.e. destrationalprival pre prealbalbal prial) ex ex ex laus in aminal aminal pridududududududududududududududududududududucal

10
下载
关闭预览

相关内容

【AAAI2022】基于对比学习的预训练语言模型剪枝压缩
专知会员服务
29+阅读 · 2022年1月24日
专知会员服务
44+阅读 · 2021年4月12日
BERT 瘦身之路:Distillation,Quantization,Pruning
灾难性遗忘问题新视角:迁移-干扰平衡
CreateAMind
17+阅读 · 2019年7月6日
Transferring Knowledge across Learning Processes
CreateAMind
29+阅读 · 2019年5月18日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
18+阅读 · 2018年12月24日
Arxiv
19+阅读 · 2021年6月15日
VIP会员
相关资讯
BERT 瘦身之路:Distillation,Quantization,Pruning
灾难性遗忘问题新视角:迁移-干扰平衡
CreateAMind
17+阅读 · 2019年7月6日
Transferring Knowledge across Learning Processes
CreateAMind
29+阅读 · 2019年5月18日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
18+阅读 · 2018年12月24日
Top
微信扫码咨询专知VIP会员