Interpretability is becoming increasingly important for predictive model analysis. Unfortunately, as remarked by many authors, there is still no consensus regarding this notion. The goal of this article is to propose a definition of the notion of interpretability that allows comparisons of rule-based algorithms. This definition consists of three terms, each one being quantitatively measured with a simple formula: predictivity, stability and simplicity. While predictivity has been extensively studied to measure the accuracy of predictive algorithms, stability is based on the Dice-Sorensen index for comparing two sets of rules generated by an algorithm using two independent samples. The simplicity is based on the sum of the length of the rules derived from the predictive model. The new measure for the interpretability of a rule-based algorithm is a weighted sum of the three terms mentioned above. We use the new measure to compare the interpretability of several rule-based algorithms, specifically CART, RuleFit, Node Harvest, Covering algorithm and SIRUS for the regression case, and CART, PART and RIPPER for the classification case
翻译:对预测性模型分析而言,解释性正在变得日益重要。不幸的是,正如许多作者所说,对于这一概念,还没有达成共识。这一条的目的是提出解释性概念的定义,以便能够比较基于规则的算法。这一定义由三个术语组成,每个术语都用简单的公式进行定量衡量:预测性、稳定性和简单性。虽然为了衡量预测性算法的准确性,对预测性进行了广泛的研究,但稳定以Dice-Sorensen指数为基础,用以比较使用两个独立样本的算法产生的两套规则。简单性基于预测性模型得出的规则长度的总和。基于规则的算法可解释性的新衡量标准是上述三个术语的加权总和。我们使用新的衡量标准来比较若干基于规则的算法的可解释性,特别是CART、RudFit、Node ARest、覆盖性算法和用于回归案例的SIRUS,以及用于分类案件的CART、Part和RPER。