In this work logistic regression when both the response and the predictor variables may be missing is considered. Several existing approaches are reviewed, including complete case analysis, inverse probability weighting, multiple imputation and maximum likelihood. The methods are compared in a simulation study, which serves to evaluate the bias, the variance and the mean squared error of the estimators for the regression coefficients. In the simulations, the maximum likelihood methodology is the one that presents the best results, followed by multiple imputation with five imputations, which is the second best. The methods are applied to a case study on the obesity for schoolchildren in the municipality of Viana do Castelo, North Portugal, where a logistic regression model is used to predict the International Obesity Task Force (IOTF) indicator from physical examinations and the past values of the obesity status. All the variables in the case study are potentially missing, with gender as the only exception. The results provided by the several methods are in well agreement, indicating the relevance of the past values of IOTF and physical scores for the prediction of obesity. Practical recommendations are given.
翻译:在这项工作中,在反应和预测变量可能缺失的情况下,考虑了后勤回归;审查了若干现有办法,包括完整的个案分析、反概率加权、多重估算和最大可能性;在模拟研究中比较了方法,以评价回归系数估计值的偏差、差异和平均正方差差;在模拟中,最有可能的方法是产生最佳结果的方法,然后是多重估算,然后是五种估算,这是第二种最佳方法;在北葡萄牙Viana do Castelo市的学童肥胖问题案例研究中,采用了一种后勤回归模型,用于预测国际肥胖问题工作队(IOTF)在物理检查中的指数和肥胖状况的过去值;案例研究中的所有变量都有可能缺失,只有性别例外;若干方法提供的结果都非常一致,表明IOTF过去的价值和物质分数对于预测肥胖的关联性;提出了切实可行的建议。