Due to rapidly rising healthcare costs worldwide, there is significant interest in controlling them. An important aspect concerns price transparency, as preliminary efforts have demonstrated that patients will shop for lower costs, driving efficiency. This requires the data to be made available, and models that can predict healthcare costs for a wide range of patient demographics and conditions. We present an approach to this problem by developing a predictive model using machine-learning techniques. We analyzed de-identified patient data from New York State SPARCS (statewide planning and research cooperative system), consisting of 2.3 million records in 2016. We built models to predict costs from patient diagnoses and demographics. We investigated two model classes consisting of sparse regression and decision trees. We obtained the best performance by using a decision tree with depth 10. We obtained an R-square value of 0.76 which is better than the values reported in the literature for similar problems.
翻译:由于全球医疗费用的迅速上涨,控制医疗费用已成为一个非常重要的问题。重要的一点是价格透明度,由此产生了初步的努力,证明患者将寻找低价,从而推动效率。这需要数据得到公开并产生模型来预测广泛范围的患者和条件下的医疗费用。我们通过使用机器学习技术开发一种预测模型来解决这个问题。我们分析了纽约州的SPARCS (statewide planning and research cooperative system)的去识别化患者数据,包括2016年230万条记录。我们建立了预测从患者诊断和人口统计学特征开始的医疗费用的模型。我们研究了两个模型类,分别为稀疏回归和决策树。通过使用深度为10的决策树,我们获得了最佳性能。我们获得了R平方值为0.76,该值比类似问题的文献报告的值更好。