外抽样2美元:估计和推断 (The out-of-sample $R^2$: estimation and inference)

Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample $R^2$, which is easy to interpret and to compare across different outcome variables. As opposed to the in-sample $R^2$, the out-of-sample $R^2$ has not been well defined and the variability on the out-of-sample $\hat{R}^2$ has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define the out-of-sample $R^2$ as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the $\hat{R}^2$. The performance of the estimators for the $R^2$ and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative $\text{Brassica napus}$ and $\text{Zea mays}$ phenotypes based on gene expression data.

翻译：标本外的预测是预测模型的酸性测试,但通常没有独立的测试数据集来评估预测错误。为此原因,通常使用数据分离算法(如交叉校验或靴套)来估计标本外性能。对于定量结果,解释差异与总差异的比率可以用确定系数或标本内值(R%2美元)来概括,这很容易解释和比较不同的结果变量。相对于标本内值(R%2美元),没有很好地界定标本外值(sample $R%2美元),因此,通常使用数据分离算法(如交叉校验或靴套等)来估计标本外性能。通常只报告其点估计,妨碍对不同结果变量的可预测性进行正式比较。在这里,我们明确定义了外值(R%2美元)作为两个预测模型的比较,提供了公正的估数,并利用最近关于数据分离估计的理论进展,为美元和美元(R_Q_Q_%)的计算结果模型提供了标准性差。我们用基数的模型展示了一种标准性能的模型。