CID：通过反事实分布衡量特征重要性 (CID: Measuring Feature Importance Through Counterfactual Distributions)

Assessing the importance of individual features in Machine Learning is critical to understand the model's decision-making process. While numerous methods exist, the lack of a definitive ground truth for comparison highlights the need for alternative, well-founded measures. This paper introduces a novel post-hoc local feature importance method called Counterfactual Importance Distribution (CID). We generate two sets of positive and negative counterfactuals, model their distributions using Kernel Density Estimation, and rank features based on a distributional dissimilarity measure. This measure, grounded in a rigorous mathematical framework, satisfies key properties required to function as a valid metric. We showcase the effectiveness of our method by comparing with well-established local feature importance explainers. Our method not only offers complementary perspectives to existing approaches, but also improves performance on faithfulness metrics (both for comprehensiveness and sufficiency), resulting in more faithful explanations of the system. These results highlight its potential as a valuable tool for model analysis.

翻译：评估机器学习中个体特征的重要性对于理解模型的决策过程至关重要。尽管现有多种方法，但由于缺乏明确的基准真值进行比较，凸显了对替代性、有充分依据的度量标准的需求。本文提出了一种新颖的事后局部特征重要性方法，称为反事实重要性分布（CID）。我们生成正反事实和负反事实两组数据，利用核密度估计对其分布进行建模，并基于分布差异度量对特征进行排序。该度量建立在严格的数学框架之上，满足作为有效度量标准所需的关键性质。通过与成熟的局部特征重要性解释方法进行比较，我们展示了本方法的有效性。我们的方法不仅为现有方法提供了互补视角，还在忠实性指标（包括完备性和充分性）上提升了性能，从而为系统提供了更忠实的解释。这些结果突显了其作为模型分析有价值工具的潜力。