Numerous COVID-19 clinical decision support systems have been developed. However many of these systems do not have the merit for validity due to methodological shortcomings including algorithmic bias. Methods Logistic regression models were created to predict COVID-19 mortality, ventilator status and inpatient status using a real-world dataset consisting of four hospitals in New York City and analyzed for biases against race, gender and age. Simple thresholding adjustments were applied in the training process to establish more equitable models. Results Compared to the naively trained models, the calibrated models showed a 57% decrease in the number of biased trials, while predictive performance, measured by area under the receiver/operating curve (AUC), remained unchanged. After calibration, the average sensitivity of the predictive models increased from 0.527 to 0.955. Conclusion We demonstrate that naively training and deploying machine learning models on real world data for predictive analytics of COVID-19 has a high risk of bias. Simple implemented adjustments or calibrations during model training can lead to substantial and sustained gains in fairness on subsequent deployment.
翻译:已经开发了许多COVID-19临床决策支持系统,但由于方法上的缺陷,包括算法偏差,其中许多系统不具有有效性; 建立了方法后勤回归模型,使用由纽约市四家医院组成的真实世界数据集预测COVID-19死亡率、通风机状况和住院状况,并分析对种族、性别和年龄的偏见; 在培训过程中应用了简单的临界值调整,以建立更公平的模型; 与经过天真培训的模型相比,经过校准的模型显示,有偏差的试验数量减少了57%,而按接收/操作曲线(AUC)下区域衡量的预测性能则保持不变; 校准后,预测模型的平均敏感性从0.527增加到0.955; 我们证明,对用于预测COVID-19分析的真实世界数据进行天真的培训和采用机器学习模型,具有很高的偏差风险; 与经过校准的模型培训中简单实施的调整或校准的结果相比,有偏差的试验数量减少了57%,而按接收/操作曲线(AUC)下区域测量的预测性表现保持不变。