It is conventional wisdom in machine learning and data mining that logical models such as rule sets are more interpretable than other models, and that among such rule-based models, simpler models are more interpretable than more complex ones. In this position paper, we question this latter assumption by focusing on one particular aspect of interpretability, namely the plausibility of models. Roughly speaking, we equate the plausibility of a model with the likeliness that a user accepts it as an explanation for a prediction. In particular, we argue that, all other things being equal, longer explanations may be more convincing than shorter ones, and that the predominant bias for shorter models, which is typically necessary for learning powerful discriminative models, may not be suitable when it comes to user acceptance of the learned models. To that end, we first recapitulate evidence for and against this postulate, and then report the results of an evaluation in a crowd-sourcing study based on about 3.000 judgments. The results do not reveal a strong preference for simple rules, whereas we can observe a weak preference for longer rules in some domains. We then relate these results to well-known cognitive biases such as the conjunction fallacy, the representative heuristic, or the recogition heuristic, and investigate their relation to rule length and plausibility.
Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the classification problem as an optimization problem and do not address the issue of statistical significance. In this paper, we formulate the binary classification problem as a two-sample testing problem. More precisely, our classification model is a generic framework that is composed of two steps. In the first step, the distance between the test instance and each training instance is calculated to derive two distance sets. In the second step, the two-sample test is performed under the null hypothesis that the two sets of distances are drawn from the same cumulative distribution. After these two steps, we have two p-values for each test instance and the test instance is assigned to the class associated with the smaller p-value. Essentially, the presented classification method can be regarded as an instance-based classifier based on hypothesis testing. The experimental results on 40 real data sets show that our method is able to achieve the same level performance as the state-of-the-art classifiers and has significantly better performance than existing testing-based classifiers. Furthermore, we can handle outlying instances and control the false discovery rate of test instances assigned to each class under the same framework.