面向数据可解释性与特征选择的部分信息分解方法 (Partial Information Decomposition for Data Interpretability and Feature Selection)

In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods that assign a single importance value, our approach is based on three metrics per feature: the mutual information shared with the target variable, the feature's contribution to synergistic information, and the amount of this information that is redundant. In particular, we develop a novel procedure based on these three metrics, which reveals not only how features are correlated with the target but also the additional and overlapping information provided by considering them in combination with other features. We extensively evaluate PIDF using both synthetic and real-world data, demonstrating its potential applications and effectiveness, by considering case studies from genetics and neuroscience.

翻译：本文提出特征部分信息分解（PIDF），一种同时实现数据可解释性与特征选择的新范式。与传统方法为每个特征分配单一重要性值不同，我们的方法基于每个特征的三个度量指标：与目标变量共享的互信息、特征对协同信息的贡献度，以及该信息中冗余部分的数量。特别地，我们基于这三个指标开发了一种新颖的分析流程，不仅能揭示特征与目标变量的相关性，还能展现特征与其他特征组合时提供的附加信息与重叠信息。我们通过合成数据与真实世界数据对PIDF进行了全面评估，结合遗传学与神经科学的案例研究，论证了其潜在应用价值与有效性。

相关内容

特征选择

关注 5935

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【NeurIPS2022】分布式自适应元强化学习

专知会员服务

24+阅读 · 2022年10月8日

【CVPR2022】提示分布学习

专知会员服务

31+阅读 · 2022年5月17日

【ICCV2021】参数化对比学习

专知会员服务

33+阅读 · 2021年7月27日

【ICML2021】基于低秩重参数化的大规模私有学习

专知会员服务

12+阅读 · 2021年6月20日