Many varieties of cross validation would be statistically appealing for the estimation of smoothing and other penalized regression hyperparameters, were it not for the high cost of evaluating such criteria. Here it is shown how to efficiently and accurately compute and optimize a broad variety of cross validation criteria for a wide range of models estimated by minimizing a quadratically penalized loss. The leading order computational cost of hyperparameter estimation is made comparable to the cost of a single model fit given hyperparameters. In many cases this represents an $O(n)$ computational saving when modelling $n$ data. This development makes if feasible, for the first time, to use leave-out-neighbourhood cross validation to deal with the wide spread problem of un-modelled short range autocorrelation which otherwise leads to underestimation of smoothing parameters. It is also shown how to accurately quantifying uncertainty in this case, despite the un-modelled autocorrelation. Practical examples are provided including smooth quantile regression, generalized additive models for location scale and shape, and focussing particularly on dealing with un-modelled autocorrelation.
翻译:许多交叉验证方法在估计平滑参数及其他惩罚回归超参数时具有统计上的吸引力,但评估这些准则的高昂成本限制了其应用。本文展示了如何高效且精确地计算和优化针对通过最小化二次惩罚损失估计的广泛模型类别的多种交叉验证准则。超参数估计的计算成本被降低至与给定超参数下的单次模型拟合成本相当。在许多情况下,当对n个数据进行建模时,这代表了O(n)量级的计算节省。这一进展首次使得利用留出邻域交叉验证来处理普遍存在的未建模短程自相关问题变得可行,该问题原本会导致平滑参数的低估。本文还展示了如何在存在未建模自相关的情况下精确量化不确定性。研究提供了包括平滑分位数回归、位置尺度与形状的广义可加模型在内的实际案例,并特别聚焦于处理未建模自相关问题。