通过地物选择和贝ysian超参数优化的XGBoost风险模型 (A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization) - 专知论文

会员服务 ·

0

xgboost · 优化器 · Weight · Performer · 特征选择 ·

2019 年 1 月 24 日

A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization

翻译：通过地物选择和贝ysian超参数优化的XGBoost风险模型

Yan Wang,Xuelei Sherry Ni

from arxiv, Accepted by International Journal of Database Management Systems (IJDMS)

This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structured Parzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank Test. The performance of XGBoost is compared to the traditionally utilized logistic regression (LR) model in terms of classification accuracy, area under the curve (AUC), recall, and F1 score obtained from the 10-fold cross validation. Results show that hierarchical clustering is the optimal FS method for LR while weight by Chi-square achieves the best performance in XG-Boost. Both TPE and RS optimization in XGBoost outperform LR significantly. TPE optimization shows a superiority over RS since it results in a significantly higher accuracy and a marginally higher AUC, recall and F1 score. Furthermore, XGBoost with TPE tuning shows a lower variability than the RS method. Finally, the ranking of feature importance based on XGBoost enhances the model interpretation. Therefore, XGBoost with Bayesian TPE hyper-parameter optimization serves as an operative while powerful approach for business risk modeling.

翻译：本文旨在探索基于极端梯度推升(XGBoost)方法的模型,用于商业风险分类。模型培训同时考虑特征选择(FS)算法和超参数优化。三种最常用的FS方法,包括基尼重量、Chi-square重量、等级变量组合、相关重量和信息重量,都用于减轻冗余特征的影响。两个超参数优化方法,随机搜索(RS)和巴伊西亚树结构型Parzenoo Estimator(TPE),在XGBoost中应用。不同的FS和超参数优化方法对模型性能的影响由威尔科森签署Rank测试调查。XGBost的性能与传统使用的后勤回归模型相比,在分类精确度方面,在曲线下区域(AUC),回顾,和从10倍交叉验证中获得的F1分数。结果显示,等级组合组合是FS方法的最佳方法,而Chi-squread 则在XGB1 级中取得最佳性业绩。TBOost,TPE 和RSerral 最精确性调整方法在XGB 上显示一个显著的底压。

5

相关内容

xgboost

xgboost的全称是eXtreme Gradient Boosting，它是Gradient Boosting Machine的一个C++实现，并能够自动利用CPU的多线程进行并行，同时在算法上加以改进提高了精度。

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

专知会员服务

48+阅读 · 2020年6月21日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

119+阅读 · 2020年5月30日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

122+阅读 · 2020年4月19日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

54+阅读 · 2020年3月13日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

75+阅读 · 2020年2月8日

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

专知会员服务

48+阅读 · 2020年1月1日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

19+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Xgboost算法——Kaggle案例

Xgboost算法——Kaggle案例

R语言中文社区

13+阅读 · 2018年3月13日

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

泡泡机器人SLAM

4+阅读 · 2018年2月11日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

Meta Learning for Task-Driven Video Summarization

Arxiv

6+阅读 · 2019年7月29日

Hierarchical Meta Learning

Arxiv

9+阅读 · 2019年4月19日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

Optimization Models for Machine Learning: A Survey

Arxiv

18+阅读 · 2019年1月16日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Automatic multi-objective based feature selection for classification

Automatic multi-objective based feature selection for classification

Arxiv

6+阅读 · 2018年7月9日

Prediction of the FIFA World Cup 2018 - A random forest approach with an emphasis on estimated team ability parameters

Arxiv

3+阅读 · 2018年6月13日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

Good Features to Correlate for Visual Tracking

Arxiv

9+阅读 · 2018年3月10日

VIP会员

文章信息

相关主题

相关VIP内容

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

专知会员服务

48+阅读 · 2020年6月21日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

119+阅读 · 2020年5月30日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

122+阅读 · 2020年4月19日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

54+阅读 · 2020年3月13日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

75+阅读 · 2020年2月8日

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

专知会员服务

48+阅读 · 2020年1月1日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

19+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Xgboost算法——Kaggle案例

Xgboost算法——Kaggle案例

R语言中文社区

13+阅读 · 2018年3月13日

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

【泡泡一分钟】一种基于光场的快速有效深度图估计方法（3dv-43）

泡泡机器人SLAM

4+阅读 · 2018年2月11日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

Meta Learning for Task-Driven Video Summarization

Arxiv

6+阅读 · 2019年7月29日

Hierarchical Meta Learning

Arxiv

9+阅读 · 2019年4月19日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

Optimization Models for Machine Learning: A Survey

Arxiv

18+阅读 · 2019年1月16日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Automatic multi-objective based feature selection for classification

Automatic multi-objective based feature selection for classification

Arxiv

6+阅读 · 2018年7月9日

Prediction of the FIFA World Cup 2018 - A random forest approach with an emphasis on estimated team ability parameters

Arxiv

3+阅读 · 2018年6月13日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

Good Features to Correlate for Visual Tracking

Arxiv

9+阅读 · 2018年3月10日

微信扫码咨询专知VIP会员