Model transparency is a prerequisite in many domains and an increasingly popular area in machine learning research. In the medical domain, for instance, unveiling the mechanisms behind a disease often has higher priority than the diagnostic itself since it might dictate or guide potential treatments and research directions. One of the most popular approaches to explain model global predictions is the permutation importance where the performance on permuted data is benchmarked against the baseline. However, this method and other related approaches will undervalue the importance of a feature in the presence of covariates since these cover part of its provided information. To address this issue, we propose Covered Information Disentanglement (CID), a method that considers all feature information overlap to correct the values provided by permutation importance. We further show how to compute CID efficiently when coupled with Markov random fields. We demonstrate its efficacy in adjusting permutation importance first on a controlled toy dataset and discuss its effect on real-world medical data.
翻译:模型透明度是许多领域的先决条件,也是机器学习研究中越来越受欢迎的领域。例如,在医疗领域,披露疾病背后的机制往往比诊断本身具有更高的优先地位,因为它可能决定或指导潜在的治疗和研究方向。解释模型全球预测的最流行的方法之一是根据基线基准衡量变异数据的性能的变异重要性。然而,这种方法和其他相关方法将低估一个特征在存在共变中的重要性,因为这些特征覆盖了它所提供的信息的一部分。为了解决这一问题,我们建议采用隐蔽信息分解(CID)这一方法,即考虑所有特征信息重叠,以纠正因变异重要性而提供的值。我们进一步展示了如何在与Markov随机字段相结合时高效率地计算CID。我们展示了它首先调整受控的玩具数据集的变异重要性并讨论其对真实世界医学数据的影响的功效。