随机森林内在性分析：特征相互影响的探索 (Opening the random forest black box by the analysis of the mutual impact of features) - 专知论文

会员服务 ·

0

随机森林 · MFi · 分析 · 高维数据分析 · 相关特征 ·

2023 年 4 月 5 日

Opening the random forest black box by the analysis of the mutual impact of features

翻译：随机森林内在性分析：特征相互影响的探索

Lucas F. Voges,Lukas C. Jarren,Stephan Seifert

Random forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships between the features are usually not considered for the selection and thus also neglected for the characterization of the analysed samples. Here we propose two novel approaches that focus on the mutual impact of features in random forests. Mutual forest impact (MFI) is a relation parameter that evaluates the mutual association of the featurs to the outcome and, hence, goes beyond the analysis of correlation coefficients. Mutual impurity reduction (MIR) is an importance measure that combines this relation parameter with the importance of the individual features. MIR and MFI are implemented together with testing procedures that generate p-values for the selection of related and important features. Applications to various simulated data sets and the comparison to other methods for feature selection and relation analysis show that MFI and MIR are very promising to shed light on the complex relationships between features and outcome. In addition, they are not affected by common biases, e.g. that features with many possible splits or high minor allele frequencies are prefered.

翻译：随机森林是一种流行的高维数据分析机器学习方法，因其灵活性和可提供相关特征的重要性指标而备受青睐。但是，特征之间的复杂关系通常不考虑选择，因此也忽略了对分析样本的表征。在这里，我们提出了两种关注随机森林中特征之间相互影响的新方法。相互森林影响（MFI）是一种关系参数，评估特征与结果之间的相互关联性，因此超出了相关系数的分析。互作用不纯度减少（MIR）是一种重要性指标，将此关系参数与个体特征的重要性相结合。 MIR和MFI与测试程序一起实现，生成选择相关和重要特征的p值。应用于各种模拟数据集，并与其他特征选择和关系分析方法进行比较，结果表明MFI和MIR非常有前途，能揭示特征与结果之间的复杂关系。另外，它们不受常见偏见的影响，例如具有许多可能分裂或高小等位基因频率的特征会受到偏爱。

0

相关内容

随机森林

随机森林指的是利用多棵树对样本进行训练并预测的一种分类器。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

专知会员服务

66+阅读 · 2019年12月20日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

动力学涨落对网络结构的影响

国家自然科学基金

0+阅读 · 2015年12月31日

影响东亚冬季气候的海温和北极海冰配置型

国家自然科学基金

0+阅读 · 2013年12月31日

中国东部地区灰霾对大气辐射和边界层气象影响的模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

企业信息系统用户的消化吸收行为模式及影响机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

遥感数据的空间分辨率和波段数对土地覆盖制图的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

“用户行为数据”稀疏表示的理论与方法

国家自然科学基金

1+阅读 · 2012年12月31日

因果推断的统计方法

国家自然科学基金

26+阅读 · 2011年12月31日

多发性硬化Th17和Treg细胞失衡的miRNA调控机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

食管癌转移高风险性相关的SNP位点筛查研究

国家自然科学基金

0+阅读 · 2008年12月31日

The Power of Linear Recurrent Neural Networks

Arxiv

0+阅读 · 2023年5月25日

Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite

Arxiv

0+阅读 · 2023年5月24日

Linear Dimensionality Reduction

Arxiv

0+阅读 · 2023年5月24日

Conditional Mutual Information for Disentangled Representations in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月23日

Disentangled Variational Autoencoder for Emotion Recognition in Conversations

Arxiv

0+阅读 · 2023年5月23日

Flexible Bayesian Quantile Analysis of Residential Rental Rates

Arxiv

0+阅读 · 2023年5月23日

A Survey on the Role of Artificial Intelligence in the Prediction and Diagnosis of Schizophrenia

Arxiv

0+阅读 · 2023年5月19日

Disentangled Information Bottleneck

Disentangled Information Bottleneck

Arxiv

12+阅读 · 2020年12月22日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

高维数据分析

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

专知会员服务

66+阅读 · 2019年12月20日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《解析陆域作战方向：一个概念性框架》报告

《人工智能与人类的未来》2025年最新300页书籍

追寻真正的AI自主性：从遗留思维到战场优势

《“蛛网”行动：乌克兰不对称作战的演进》报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

The Power of Linear Recurrent Neural Networks

Arxiv

0+阅读 · 2023年5月25日

Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite

Arxiv

0+阅读 · 2023年5月24日

Linear Dimensionality Reduction

Arxiv

0+阅读 · 2023年5月24日

Conditional Mutual Information for Disentangled Representations in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月23日

Disentangled Variational Autoencoder for Emotion Recognition in Conversations

Arxiv

0+阅读 · 2023年5月23日

Flexible Bayesian Quantile Analysis of Residential Rental Rates

Arxiv

0+阅读 · 2023年5月23日

A Survey on the Role of Artificial Intelligence in the Prediction and Diagnosis of Schizophrenia

Arxiv

0+阅读 · 2023年5月19日

Disentangled Information Bottleneck

Disentangled Information Bottleneck

Arxiv

12+阅读 · 2020年12月22日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

相关基金

动力学涨落对网络结构的影响

国家自然科学基金

0+阅读 · 2015年12月31日

影响东亚冬季气候的海温和北极海冰配置型

国家自然科学基金

0+阅读 · 2013年12月31日

中国东部地区灰霾对大气辐射和边界层气象影响的模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

企业信息系统用户的消化吸收行为模式及影响机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

遥感数据的空间分辨率和波段数对土地覆盖制图的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

“用户行为数据”稀疏表示的理论与方法

国家自然科学基金

1+阅读 · 2012年12月31日

因果推断的统计方法

国家自然科学基金

26+阅读 · 2011年12月31日

多发性硬化Th17和Treg细胞失衡的miRNA调控机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

食管癌转移高风险性相关的SNP位点筛查研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员