适用于小企业贷款风险评估的具有合成特征的CatBoost模型 (CatBoost model with synthetic features in application to loan risk assessment of small businesses) - 专知论文

会员服务 ·

0

合成特征 · AUC · 模型评估 · 数据集 · MoDELS ·

2021 年 6 月 17 日

CatBoost model with synthetic features in application to loan risk assessment of small businesses

翻译：适用于小企业贷款风险评估的具有合成特征的CatBoost模型

Liexin Cheng,Haoxue Wang

Loan risk for small business has long been a complex problem worthy of exploring. Predicting the loan risk approximately can benefit entrepreneurship by developing more jobs for the society. CatBoost (Categorical Boosting) is a powerful machine learning algorithm that is suitable for dataset with many categorical variables like the dataset for forecasting loan risk. In this paper, we identify the important risk factors that contribute to loan status classification problem. Then we compare the the performance between boosting-type algorithms(especially CatBoost) with other traditional yet popular ones. The dataset we adopt in the research comes from the U.S. Small Business Administration (SBA) and holds a very large sample size (899,164 observations and 27 features). We obtain a high accuracy of 95.74% and well-performed AUC of 98.59% compared with the existent literature of related research. In order to make best use of the important features in the dataset, we propose a technique named "synthetic generation" to develop more combined features based on arithmetic operation, which ends up improving the accuracy and AUC of original CatBoost model.

翻译：对小企业的贷款风险长期以来是一个值得探讨的复杂问题。预测贷款风险大约可以通过为社会创造更多就业机会而使创业受益。 Catboost(Catboost)是一种强大的机器学习算法,适合于包含许多绝对变量的数据集,如用于预测贷款风险的数据集。在本文中,我们确定了导致贷款状况分类问题的重要风险因素。然后我们比较了提振型算法(特别是CatBoost)与其他传统但很受欢迎的算法的性能。我们在研究中使用的数据集来自美国小企业管理局(SABA),具有非常大的样本规模(899,164次观察和27个特征)。我们获得了95.74%的高精度和完善的ACUC,比相关研究的现有文献高出98.59%。为了最佳地利用数据集中的重要特征,我们建议了一种名为“合成生成”的技术,以根据计算操作开发更多组合特征,从而最终改进了原CatBoost模型的准确性和AUC。

0

相关内容

合成特征

应用机器学习书稿，361页pdf

应用机器学习书稿，361页pdf

专知会员服务

59+阅读 · 2020年11月24日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书】用Python六步掌握机器学习，第二版，469页pdf，使用Python进行预测数据分析的实用实现指南Mastering Machine Learning with Python in Six Steps, 2nd Edition A Practical Implementation Guide to Predictive Data Analytics Using Python

【新书】用Python六步掌握机器学习，第二版，469页pdf，使用Python进行预测数据分析的实用实现指南Mastering Machine Learning with Python in Six Steps, 2nd Edition A Practical Implementation Guide to Predictive Data Analytics Using Python

专知会员服务

88+阅读 · 2020年2月2日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【ML课程】多变量微积分（Multivariable Calculus），加州大学伯克利分校| Prof. Denis Auroux

【ML课程】多变量微积分（Multivariable Calculus），加州大学伯克利分校| Prof. Denis Auroux

专知会员服务

10+阅读 · 2020年1月7日

【经典图书】机器学习基础，427页pdf Foundations of machine learning

【经典图书】机器学习基础，427页pdf Foundations of machine learning

专知会员服务

158+阅读 · 2019年11月14日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

RF(随机森林)、GBDT、XGBoost面试级整理

RF(随机森林)、GBDT、XGBoost面试级整理

数据挖掘入门与实战

7+阅读 · 2018年2月6日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Interpreting and improving deep-learning models with reality checks

Arxiv

0+阅读 · 2021年8月19日

Restats: A Test Coverage Tool for RESTful APIs

Restats: A Test Coverage Tool for RESTful APIs

Arxiv

0+阅读 · 2021年8月18日

XtracTree: a Simple and Effective Method for Regulator Validation of Bagging Methods Used in Retail Banking

XtracTree: a Simple and Effective Method for Regulator Validation of Bagging Methods Used in Retail Banking

Arxiv

0+阅读 · 2021年8月17日

Faster Kernel Interpolation for Gaussian Processes

Arxiv

0+阅读 · 2021年8月13日

Data-driven advice for interpreting local and global model predictions in bioinformatics problems

Data-driven advice for interpreting local and global model predictions in bioinformatics problems

Arxiv

0+阅读 · 2021年8月13日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

相关VIP内容

应用机器学习书稿，361页pdf

应用机器学习书稿，361页pdf

专知会员服务

59+阅读 · 2020年11月24日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书】用Python六步掌握机器学习，第二版，469页pdf，使用Python进行预测数据分析的实用实现指南Mastering Machine Learning with Python in Six Steps, 2nd Edition A Practical Implementation Guide to Predictive Data Analytics Using Python

【新书】用Python六步掌握机器学习，第二版，469页pdf，使用Python进行预测数据分析的实用实现指南Mastering Machine Learning with Python in Six Steps, 2nd Edition A Practical Implementation Guide to Predictive Data Analytics Using Python

专知会员服务

88+阅读 · 2020年2月2日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【ML课程】多变量微积分（Multivariable Calculus），加州大学伯克利分校| Prof. Denis Auroux

【ML课程】多变量微积分（Multivariable Calculus），加州大学伯克利分校| Prof. Denis Auroux

专知会员服务

10+阅读 · 2020年1月7日

【经典图书】机器学习基础，427页pdf Foundations of machine learning

【经典图书】机器学习基础，427页pdf Foundations of machine learning

专知会员服务

158+阅读 · 2019年11月14日

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

【ICCV 2019 Workshop】Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Grou，加州大学伯克利分校马毅

专知会员服务

16+阅读 · 2019年10月31日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

俄乌战争启示：坦克战与不断演变的战斗形态

《大规模作战行动中与无人机集成的C5ISR系统》

《主观概率约束下寻找可行系统及其军事应用》69页

《美政府问责局：多种挑战影响地面战车任务出勤率》2025最新130页

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

RF(随机森林)、GBDT、XGBoost面试级整理

RF(随机森林)、GBDT、XGBoost面试级整理

数据挖掘入门与实战

7+阅读 · 2018年2月6日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

相关论文

Interpreting and improving deep-learning models with reality checks

Arxiv

0+阅读 · 2021年8月19日

Restats: A Test Coverage Tool for RESTful APIs

Restats: A Test Coverage Tool for RESTful APIs

Arxiv

0+阅读 · 2021年8月18日

XtracTree: a Simple and Effective Method for Regulator Validation of Bagging Methods Used in Retail Banking

XtracTree: a Simple and Effective Method for Regulator Validation of Bagging Methods Used in Retail Banking

Arxiv

0+阅读 · 2021年8月17日

Faster Kernel Interpolation for Gaussian Processes

Arxiv

0+阅读 · 2021年8月13日

Data-driven advice for interpreting local and global model predictions in bioinformatics problems

Data-driven advice for interpreting local and global model predictions in bioinformatics problems

Arxiv

0+阅读 · 2021年8月13日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员