This paper proposes a novel two-step strategy for testing the goodness-of-fit of parametric regression models in ultra-high dimensional sparse settings, where the predictor dimension far exceeds the sample size. This regime usually renders existing goodness-of-fit tests for regressions infeasible, primarily due to the curse of dimensionality or their reliance on the asymptotic linearity and normality of parameter estimators -- properties that may no longer hold under ultra-high dimensional settings. To address these limitations, our strategy first constructs multiple test statistics based on projected predictors from distinct projections and establishes their asymptotic properties under both the null and alternative hypotheses. This projection-based approach significantly mitigates the dimensionality problem, enabling our tests to detect local alternatives converging to the null at the rate as if the predictor were univariate. An important finding is that the resulting test statistics based on linearly independent projections are asymptotically independent under the null hypothesis. Based on this, our second step employs powerful $p$-value combination procedures, such as the minimum $p$-value and the Fisher combination of $p$-value, to form our final tests and enhance power. Theoretically, our tests only require the standard convergence rate of parameter estimators to derive their limiting distributions, thereby circumventing the need for asymptotic linearity or normality of parameter estimators. Simulations and real-data applications confirm that our approach provides robust and powerful goodness-of-fit testing in ultra-high dimensional settings.
翻译:本文提出了一种新颖的两步策略,用于检验超高维稀疏设定下参数回归模型的拟合优度,其中预测变量的维度远超样本量。在这种设定下,现有的回归拟合优度检验方法通常不可行,这主要是由于维数灾难,或者因为这些方法依赖于参数估计量的渐近线性与正态性——这些性质在超高维设定下可能不再成立。为应对这些局限,我们的策略首先基于来自不同投影的投影预测变量构造多个检验统计量,并在原假设与备择假设下建立了它们的渐近性质。这种基于投影的方法显著缓解了维数问题,使得我们的检验能够以预测变量为一维时的速率检测收敛于原假设的局部备择假设。一个重要发现是,基于线性无关投影所得的检验统计量在原假设下是渐近独立的。基于此,我们的第二步采用强力的$p$值合并程序,例如最小$p$值与$p$值的Fisher合并,以构建最终的检验并提升功效。理论上,我们的检验仅需参数估计量的标准收敛速率即可推导其极限分布,从而规避了对参数估计量渐近线性或正态性的需求。模拟实验与真实数据应用证实,我们的方法在超高维设定下提供了稳健且强力的拟合优度检验。