Early exiting has demonstrated its effectiveness in accelerating the inference of pre-trained language models like BERT by dynamically adjusting the number of layers executed. However, most existing early exiting methods only consider local information from an individual test sample to determine their exiting indicators, failing to leverage the global information offered by sample population. This leads to suboptimal estimation of prediction correctness, resulting in erroneous exiting decisions. To bridge the gap, we explore the necessity of effectively combining both local and global information to ensure reliable early exiting during inference. Purposefully, we leverage prototypical networks to learn class prototypes and devise a distance metric between samples and class prototypes. This enables us to utilize global information for estimating the correctness of early predictions. On this basis, we propose a novel Distance-Enhanced Early Exiting framework for BERT (DE$^3$-BERT). DE$^3$-BERT implements a hybrid exiting strategy that supplements classic entropy-based local information with distance-based global information to enhance the estimation of prediction correctness for more reliable early exiting decisions. Extensive experiments on the GLUE benchmark demonstrate that DE$^3$-BERT consistently outperforms state-of-the-art models under different speed-up ratios with minimal storage or computational overhead, yielding a better trade-off between model performance and inference efficiency. Additionally, an in-depth analysis further validates the generality and interpretability of our method.
翻译:早退机制通过动态调整执行层数,已证明能有效加速如BERT等预训练语言模型的推理。然而,现有大多数早退方法仅依据单个测试样本的局部信息来确定退出指标,未能充分利用样本整体所提供的全局信息。这导致对预测正确性的估计欠佳,从而产生错误的退出决策。为弥补这一不足,我们探讨了在推理过程中有效结合局部与全局信息以确保可靠早退的必要性。为此,我们利用原型网络学习类别原型,并设计样本与类别原型之间的距离度量。这使得我们能够借助全局信息来估计早期预测的正确性。在此基础上,我们提出了一种新颖的距离增强型BERT早退推理框架(DE$^3$-BERT)。DE$^3$-BERT采用混合退出策略,在以经典熵为基础的局部信息之外,补充基于距离的全局信息,以增强对预测正确性的估计,从而做出更可靠的早退决策。在GLUE基准上的大量实验表明,DE$^3$-BERT在不同加速比下均持续优于最先进模型,且存储与计算开销极小,实现了模型性能与推理效率间更优的权衡。此外,深入分析进一步验证了我们方法的通用性与可解释性。