从充分理性中学习：分析解释忠实性与词元级正则化策略之间的关系 (Learning from Sufficient Rationales: Analysing the Relationship Between Explanation Faithfulness and Token-level Regularisation Strategies)

Human explanations of natural language, rationales, form a tool to assess whether models learn a label for the right reasons or rely on dataset-specific shortcuts. Sufficiency is a common metric for estimating the informativeness of rationales, but it provides limited insight into the effects of rationale information on model performance. We address this limitation by relating sufficiency to two modelling paradigms: the ability of models to identify which tokens are part of the rationale (through token classification) and the ability of improving model performance by incorporating rationales in the input (through attention regularisation). We find that highly informative rationales are not likely to help classify the instance correctly. Sufficiency conversely captures the classification impact of the non-rationalised context, which interferes with rationale information in the same input. We also find that incorporating rationale information in model inputs can boost cross-domain classification, but results are inconsistent per task and model type. Finally, sufficiency and token classification appear to be unrelated. These results exemplify the complexity of rationales, showing that metrics capable of systematically capturing this type of information merit further investigation.

翻译：人类对自然语言的解释——理性依据——是评估模型是否基于正确理由学习标签，还是依赖数据集特定捷径的工具。充分性是衡量理性依据信息量的常用指标，但其对理性信息如何影响模型性能的洞察有限。我们通过将充分性与两种建模范式联系起来解决这一局限：模型识别哪些词元属于理性依据的能力（通过词元分类）以及通过将理性依据纳入输入来提升模型性能的能力（通过注意力正则化）。研究发现，信息量高的理性依据未必有助于正确分类实例。相反，充分性捕捉了非理性化上下文的分类影响，这些上下文会与同一输入中的理性信息产生干扰。研究还发现，在模型输入中融入理性信息可提升跨领域分类性能，但结果因任务和模型类型而异。最后，充分性与词元分类似乎并无关联。这些结果揭示了理性依据的复杂性，表明能够系统捕捉此类信息的度量指标值得进一步研究。