Large-scale testing in modern applications such as genomics often entails a trade-off between accuracy and speed: multiplicity corrections push cutoffs deep into the tails, where normal approximations can fail, while resampling is accurate but computationally expensive for large datasets. To resolve this impasse in the context of conditional independence testing, we introduce spaCRT, a closed-form saddlepoint approximation (SPA) for the distilled conditional randomization test (dCRT) that retains the statistical accuracy of dCRT's resampling while avoiding its computational cost. We prove that spaCRT's relative approximation error vanishes asymptotically by establishing a general theorem on the relative error of conditional SPAs. Because dCRT uses a plug-in nuisance regression, we specialize our guarantees to common choices: low-dimensional generalized linear model (GLM), high-dimensional GLM lasso, and kernel ridge regression. Our general theorem is, to our knowledge, the first rigorous technical tool for analyzing SPAs for resampling tests, which had previously been justified only heuristically. It extends beyond spaCRT, as we exemplify by justifying an SPA for the classical sign-flipping location test. Empirically, spaCRT matches dCRT's statistical performance by approximating its p-values with median error 1-12% across settings while delivering a 250x speedup on a single-cell CRISPR screen dataset with 85,000 hypotheses. Building on dCRT's versatility, spaCRT and its open-source R package enable fast and accurate large-scale testing across diverse applications.
翻译:现代应用(如基因组学)中的大规模检验常面临精度与速度的权衡:多重性校正将阈值推至分布尾部,而正态近似在此区域可能失效;重采样法虽精确,但对大规模数据集计算成本高昂。针对条件独立性检验场景,本文提出spaCRT方法——一种用于蒸馏条件随机化检验(dCRT)的闭式鞍点近似(SPA)方法,在保持dCRT重采样统计精度的同时规避其计算代价。通过建立关于条件SPA相对误差的通用定理,我们证明spaCRT的相对近似误差渐近收敛于零。由于dCRT采用插件式干扰回归,我们将理论保证具体应用于常见模型:低维广义线性模型(GLM)、高维GLM套索回归及核岭回归。该通用定理是首个严格分析重采样检验SPA的理论工具,此类方法此前仅通过启发式论证。其适用性超越spaCRT框架,例如我们通过论证经典符号翻转位置检验的SPA有效性予以佐证。实证表明,spaCRT在多种设定下能以1-12%的中位误差逼近dCRT的p值,匹配其统计性能,并在包含85,000个假设的单细胞CRISPR筛选数据集上实现250倍加速。基于dCRT的通用性,spaCRT及其开源R软件包将为跨领域应用提供高效精准的大规模检验工具。