使用随机森林模型的可解释的网基Gliosblastoma多形式预测预测预测工具 (An Interpretable Web-based Glioblastoma Multiforme Prognosis Prediction Tool using Random Forest Model)

We propose predictive models that estimate GBM patients' health status of one-year after treatments (Classification task), predict the long-term prognosis of GBM patients at an individual level (Survival task). We used total of 467 GBM patients' clinical profile consists of 13 features and two follow-up dates. For baseline models of random forest classifier(RFC) and random survival forest model (RSF), we introduced generalized linear model (GLM), support vector machine (SVM) and Cox proportional hazardous model (COX), accelerated failure time model (AFT) respectively. After preprocessing and prefixing stratified 5-fold data set, we generated best performing models for model types using recursive feature elimination process. Total 10, 4, and 13 features were extracted for best performing one-year survival/progression status RFC models and RSF model via the recursive feature elimination process. In classification task, AUROC of best performing RFC recorded 0.6990 (for one-year survival status classification) and 0.7076 (for one-year progression classification) while that of second best baseline models (GLM in both cases) recorded 0.6691 and 0.6997 respectively. About survival task, the highest C-index of 0.7157 and the lowest IBS of 0.1038 came from the best performing RSF model while that of second best baseline models were 0.6556 and 0.1139 respectively. A simplified linear correlation (extracted from LIME and virtual patient group analysis) between each feature and prognosis of GBM patient were consistent with proven medical knowledge. Our machine learning models suggest that the top three prognostic factors for GBM patient survival were MGMT gene promoter, the extent of resection, and age. To the best of our knowledge, this study is the very first study introducing a interpretable and medical knowledge consistent GBM prognosis predictive models.

翻译：我们提出了预测模型,用以估计GBM病人在治疗后一年的健康状况(分类任务),预测GBM病人在个人一级的长期预测(Survival任务),我们总共使用了467 GBM病人临床剖面包括13个特征和两个后续日期。对于随机森林分类和随机生存森林模型的基准模型(RSF),我们采用了通用线性模型(GLM)、支持病媒机(SVM)和Cox比例危险模型(COX),加速失败时间模型(AFT),在预先处理和预先确定5倍的GBM病人个人一级的长期预测(Survival 任务);我们总共使用了467 GBM的临床病人临床剖面剖面剖面剖面剖面剖面剖面剖面剖面剖面剖面剖面剖面剖面图,10、4和13个特征用于最佳进行一年生存/进展状态分析的模型(RSF),在分类任务中,AUROC的首次记录了0.69 和0.69 最新流流介面介面的RM,同时分别记录了我们最起码的0.17年的G的SFM 和0.18M 的SLI 最新的模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/