In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods that assign a single importance value, our approach is based on three metrics per feature: the mutual information shared with the target variable, the feature's contribution to synergistic information, and the amount of this information that is redundant. In particular, we develop a novel procedure based on these three metrics, which reveals not only how features are correlated with the target but also the additional and overlapping information provided by considering them in combination with other features. We extensively evaluate PIDF using both synthetic and real-world data, demonstrating its potential applications and effectiveness, by considering case studies from genetics and neuroscience.
翻译:本文提出特征部分信息分解(PIDF),一种同时实现数据可解释性与特征选择的新范式。与传统方法为每个特征分配单一重要性值不同,我们的方法基于每个特征的三个度量指标:与目标变量共享的互信息、特征对协同信息的贡献度,以及该信息中冗余部分的数量。特别地,我们基于这三个指标开发了一种新颖的分析流程,不仅能揭示特征与目标变量的相关性,还能展现特征与其他特征组合时提供的附加信息与重叠信息。我们通过合成数据与真实世界数据对PIDF进行了全面评估,结合遗传学与神经科学的案例研究,论证了其潜在应用价值与有效性。