Feature importance is commonly used to explain machine predictions. While feature importance can be derived from a machine learning model with a variety of methods, the consistency of feature importance via different methods remains understudied. In this work, we systematically compare feature importance from built-in mechanisms in a model such as attention values and post-hoc methods that approximate model behavior such as LIME. Using text classification as a testbed, we find that 1) no matter which method we use, important features from traditional models such as SVM and XGBoost are more similar with each other, than with deep learning models; 2) post-hoc methods tend to generate more similar important features for two models than built-in methods. We further demonstrate how such similarity varies across instances. Notably, important features do not always resemble each other better when two models agree on the predicted label than when they disagree.
翻译:通常使用特征的重要性来解释机器预测。 虽然特征的重要性可以从机器学习模式中得出,并采用各种方法,但不同方法的特征重要性的一致性仍然没有得到充分研究。 在这项工作中,我们系统地比较了在模型中内在机制中具有的特征重要性,例如关注值和类似LIME等模型行为的热后方法。使用文本分类作为测试台,我们发现:(1) 我们使用哪种方法,诸如SVM和XGBoost等传统模型的重要特征,与深层学习模型相比,彼此更为相似;(2) 后热方法往往为两种模型产生比内生方法更为相似的重要特征。我们进一步表明这种相似性在各种情况下如何不同。值得注意的是,重要特征在两个模型就预测的标签达成一致时并不总是比在它们不一致时更加相似。