This paper introduces Syntactic Attention Pruning (SAP), a novel method for effectively pruning attention heads in Transformer models. Unlike conventional approaches that rely solely on mathematical analysis of model weights and activations, SAP incorporates both the syntactic structure and attention patterns of sentences to guide the pruning process. By leveraging these linguistic features, SAP not only achieves performance comparable to state-of-the-art methods but also enhances the interpretability of model behavior. To further improve robustness, we propose Candidate Filtering (CF), a mechanism that prioritizes heads based on their contribution to model performance, mitigating degradation during pruning. Experimental results indicate that SAP effectively preserves critical heads of a high density of strong attention values, outperforming existing head pruning strategies in retrain-free settings. These findings position SAP as a promising foundation for a new direction in model compression research, offering high flexibility for pruning across all transformer-based language models.
翻译:本文提出了一种新颖的句法注意力剪枝方法,用于高效剪枝Transformer模型中的注意力头。与仅依赖模型权重和激活值数学分析的传统方法不同,SAP结合句子的句法结构和注意力模式来指导剪枝过程。通过利用这些语言特征,SAP不仅实现了与最先进方法相当的性能,还增强了模型行为的可解释性。为进一步提升鲁棒性,我们提出候选过滤机制,该机制根据注意力头对模型性能的贡献度进行优先级排序,从而缓解剪枝过程中的性能退化。实验结果表明,SAP能有效保留具有高密度强注意力值的关键注意力头,在无需重训练的场景下优于现有的注意力头剪枝策略。这些发现使SAP成为模型压缩研究新方向的有前景的基础框架,为所有基于Transformer的语言模型提供了高度灵活的剪枝方案。