Traditionally, spline or kernel approaches in combination with parametric estimation are used to infer the linear coefficient (fixed effects) in a partially linear mixed-effects model for repeated measurements. Using machine learning algorithms allows us to incorporate complex interaction structures and high-dimensional variables. We employ double machine learning to cope with the nonparametric part of the partially linear mixed-effects model: the nonlinear variables are regressed out nonparametrically from both the linear variables and the response. This adjustment can be performed with any machine learning algorithm, for instance random forests, which allows to take complex interaction terms and nonsmooth structures into account. The adjusted variables satisfy a linear mixed-effects model, where the linear coefficient can be estimated with standard linear mixed-effects techniques. We prove that the estimated fixed effects coefficient converges at the parametric rate, is asymptotically Gaussian distributed, and semiparametrically efficient. Two simulation studies demonstrate that our method outperforms a penalized regression spline approach in terms of coverage. We also illustrate our proposed approach on a longitudinal dataset with HIV-infected individuals. Software code for our method is available in the R-package dmlalg.
翻译:使用机器学习算法,我们能够将复杂的交互结构和高维变量纳入其中。我们用双机学习来应付部分线性混合效应模型的非对称部分:非线性变量从线性变量和反应中反退,非线性变量从线性变量和反应中均以非对称方式退出。这种调整可以通过任何机器学习算法进行,例如随机森林,这种算法允许将复杂的交互条件和非摩天结构考虑在内。调整的变量符合线性混合效应模型,在这个模型中,线性系数可以用标准的线性混合效应技术来估计。我们证明,估计的固定效应系数在准线性混合效应模型中是同比的,在分布式和半对称效率上都是一样的。两个模拟研究表明,我们的方法在覆盖范围方面超过了一种惩罚性的回归矩式方法。我们还说明了我们提议的与受艾滋病毒影响的个人的纵向数据设置方法。我们所用的Rg软件代码在方法中是可用的。