带有半非参数扰动模型的易变向反向模型 (Doubly Robust Covariate Shift Regression with Semi-nonparametric Nuisance Models)

In contemporary statistical learning, covariate shift correction plays an important role when distribution of the testing data is shifted from the training data. Importance weighting is used to adjust for this but is not robust to model misspecifcation or excessive estimation error. In this paper, we propose a doubly robust covariate shift regression approach that introduces an imputation model for the targeted response, and uses it to augment the importance weighting equation. With a novel semi-nonparametric construction for the two nuisance models, our method is less prone to the curse of dimensionality compared to the nonparametric approaches, and is less prone to model mis-specification than the parametric approach. To remove the overfitting bias of the nonparametric components under potential model mis-specification, we construct calibrated moment estimating equations for the semi-nonparametric models. We show that our estimator is root-n consistent when at least one nuisance model is correctly specified, estimation for the parametric part of the nuisance models achieves parametric rate, and the nonparametric components are rate doubly robust. Simulation studies demonstrate that our method is more robust and efficient than existing parametric and fully nonparametric (machine learning) estimators under various configurations. We also examine the utility of our method through a real example about transfer learning of phenotyping algorithm for bipolar disorder. Finally, we propose ways to improve the (intrinsic) efficiency of our estimator and to incorporate high dimensional or machine learning models with our proposed framework.

翻译：在当代统计学学中,当测试数据的分布与培训数据发生偏差时,千变换变化校正起着重要作用。当测试数据的分布与培训数据发生偏差时,用重量加权来调整这一点,但用重量加权来模拟偏差或过大估计错误。在本文中,我们建议了一种双倍稳重的共变换回归方法,为定向反应引入了估算模型,并用它来增加重要加权等式。在两种扰动模型的新颖的半非参数构造中,我们的方法比非参数方法更容易受到维度的诅咒,而且比准参数法更不易被模型误分辨。要消除潜在模型偏差或过大估计错误的不参数的偏差性。为了消除潜在模型偏差或过大估计错误的偏差或过重估计错误的偏差性偏差性,我们提出了一种双偏差的偏差性调整法,我们提出了一种比较稳健的测算法,我们用各种精准性模型来研究我们目前的方法。我们目前精准的精准性结构的精准性分析方法,我们目前的精准性比我们目前的精准性研究方法,我们最后的精准性研究方法通过一种精准性方法来研究更精准性地检查各种精准的精准的精准性的方法。我们精准的精准性的方法,从细的精准性地研究各种精准性的方法,我们精准的精准的精准的精准的精准性的方法在最后的精准性地研究方法之下的精准性地研究方法之下,我们的精准性地研细的精准性地研细的精准性地研细的精准性地研细性地研细性地研细的精准性地研究。我们的精准性地研细性地研细的精准性地研细性地研细性地研细性地研细性地研细性地研细性地研究。