" 从紧张到粗略:为更好的预先培训语言模式模型压缩而争先恐后 " 。 (From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression)

Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge. Besides, to better retain the performance of the pruned model, the snapshots (i.e., the intermediate models at each pruning iteration) also serve as effective supervisions for pruning. Our extensive experiments show that adopting CAP consistently yields significant improvements, especially in extremely high sparsity scenarios. With only 3% model parameters reserved (i.e., 97% sparsity), CAP successfully achieves 99.2% and 96.3% of the original BERT performance in QQP and MNLI tasks. In addition, our probing experiments demonstrate that the model pruned by CAP tends to achieve better generalization ability.

翻译：培训前语言模型(PLM) 在培训前和微调模式下,在各种自然语言处理(NLP)任务中取得了巨大的成功。在大量的参数下, PLM 是计算密集型和资源饥饿的。因此, 引入了模型修剪, 以压缩大型 PLM 。然而, 大多数前方法都只考虑对下游任务的具体任务知识, 却忽略了修剪过程中的基本任务- 不可知性知识, 这可能导致灾难性的忘记问题, 导致一般化能力差。为了在预修和微调模式下保持任务、特定任务的知识, 我们提议在预修的模型中, PLPM3 运行(CAP ) 。它设计成一个总框架, 与结构化和无结构的裁剪接相兼容。在对比学习中, CAPM 能够让精练模型从预修过的模式中学习, 仅能导致灾难性的忘性问题, 并精确地调整特定任务的知识模式。此外, 要更好地保留经修练模型的绩效, 缩模 3 (i.e. destrationalprival pre prealbalbal prial) ex ex ex laus in aminal aminal pridududududududududududududududududududududucal