Estimating the predictive uncertainty of pre-trained language models is important for increasing their trustworthiness in NLP. Although many previous works focus on quantifying prediction uncertainty, there is little work on explaining the uncertainty. This paper pushes a step further on explaining uncertain predictions of post-calibrated pre-trained language models. We adapt two perturbation-based post-hoc interpretation methods, Leave-one-out and Sampling Shapley, to identify words in inputs that cause the uncertainty in predictions. We test the proposed methods on BERT and RoBERTa with three tasks: sentiment classification, natural language inference, and paraphrase identification, in both in-domain and out-of-domain settings. Experiments show that both methods consistently capture words in inputs that cause prediction uncertainty.
翻译:估计经过训练的语文模型的预测不确定性对于提高其在NLP中的可信度十分重要。 虽然许多以前的工作侧重于量化预测不确定性,但几乎没有解释不确定性的工作。本文进一步解释了对经过校准的经过训练的语文模型的不确定预测。我们调整了两种以扰动为基础的事后解释方法,即请假一出和抽样形状,以确定造成预测不确定性的投入中的词语。我们用三种任务测试BERT和ROBERTA的拟议方法:情绪分类、自然语言推论和参数识别,无论是在内部还是外部环境。实验显示,这两种方法都一致地捕捉了造成预测不确定性的投入中的词语。