Replicability is central to scientific progress, and the partial conjunction (PC) hypothesis testing framework provides an objective tool to quantify it across disciplines. Existing PC methods assume independent studies. Yet many modern applications, such as genome-wide association studies (GWAS) with sample overlap, violate this assumption, leading to dependence among study-specific summary statistics. Failure to account for this dependence can drastically inflate type I errors when combining inferences. We propose e-Filter, a powerful procedure grounded on the theory of e-values. It involves a filtering step that retains a set of the most promising PC hypotheses, and a selection step where PC hypotheses from the filtering step are marked as discoveries whenever their e-values exceed a selection threshold. We establish the validity of e-Filter for FWER and FDR control under unknown study dependence. A comprehensive simulation study demonstrates its excellent power gains over competing methods. We apply e-Filter to a GWAS replicability study to identify consistent genetic signals for low-density lipoprotein cholesterol (LDL-C). Here, the participating studies exhibit varying levels of sample overlap, rendering existing methods unsuitable for combining inferences. A subsequent pathway enrichment analysis shows that e-Filter replicated signals achieve stronger statistical enrichment on biologically relevant LDL-C pathways than competing approaches.
翻译:可重复性是科学进步的核心,部分合取假设检验框架为跨学科量化可重复性提供了客观工具。现有部分合取方法均假设研究间相互独立。然而许多现代应用(如存在样本重叠的全基因组关联研究)违背了这一假设,导致研究特异性汇总统计量之间存在依赖性。在合并推断时忽略这种依赖性会严重膨胀第一类错误。我们提出了基于e值理论的e-Filter方法,这是一种高效能流程。该方法包含过滤步骤(保留一组最具潜力的部分合取假设)和选择步骤(当过滤步骤中部分合取假设的e值超过选择阈值时,将其标记为发现)。我们证明了e-Filter在未知研究依赖情况下对族错误率和错误发现率控制的有效性。综合模拟研究显示该方法相较于竞争方法具有显著的效能优势。我们将e-Filter应用于全基因组关联研究的可重复性分析,以识别低密度脂蛋白胆固醇的一致性遗传信号。该案例中参与研究存在不同程度的样本重叠,使得现有方法不适用于合并推断。后续通路富集分析表明,e-Filter复现的信号在低密度脂蛋白胆固醇相关生物学通路上比竞争方法获得了更强的统计富集性。