Accounting for important interaction effects can improve prediction of many statistical learning models. Identification of relevant interactions, however, is a challenging issue owing to their ultrahigh-dimensional nature. Interaction screening strategies can alleviate such issues. However, due to heavier tail distribution and complex dependence structure of interaction effects, innovative robust and/or model-free methods for screening interactions are required to better scale analysis of complex and high-throughput data. In this work, we develop a new model-free interaction screening method, termed Kendall Interaction Filter (KIF), for the classification in high-dimensional settings. The KIF method suggests a weighted-sum measure, which compares the overall to the within-cluster Kendall's $\tau$ of pairs of predictors, to select interactive couples of features. The proposed KIF measure captures relevant interactions for the clusters response-variable, handles continuous, categorical or a mixture of continuous-categorical features, and is invariant under monotonic transformations. We show that the KIF measure enjoys the sure screening property in the high-dimensional setting under mild conditions, without imposing sub-exponential moment assumptions on the features' distributions. We illustrate the favorable behavior of the proposed methodology compared to the methods in the same category using simulation studies, and we conduct real data analyses to demonstrate its utility.
翻译:重要互动效应的会计可以改进对许多统计学习模型的预测。然而,确定相关互动是一个具有挑战性的问题,因为其具有超高度性质。互动筛选战略可以缓解此类问题。然而,由于互动效应的更重尾部分布和复杂的依赖性结构,为更好地分析复杂和高通量数据,需要创新的稳健和(或)无模式的筛选互动方法来更好地规模分析复杂和高通量数据。在这项工作中,我们开发了一种新的无模式的互动筛选方法,称为肯德尔互动过滤(KIF),用于在高方位环境中进行分类。KIF方法提出了一种加权和总和计量,将总和与Kendall组内数对等预测数的美元比较,以选择互动的组合组合。拟议的KIF衡量方法为集群反应可变、连续处理、绝对或混合的连续计算特征,在单方位变体变体变化中进行。我们表明,KIF措施在温度条件下享有肯定的筛选属性,而无需在亚化时刻假设,以选择组合组合组合的组合组合组合组合组合,以选择互动的功能组合。我们使用模拟分析方法,我们用模拟方法演示了模拟分析。我们用模拟方法进行模拟分析。