Selecting important features in high-dimensional survival analysis is critical for identifying confirmatory biomarkers while maintaining rigorous error control. In this paper, we propose a derandomized knockoffs procedure for Cox regression that enhances stability in feature selection while maintaining rigorous control over the k-familywise error rate (k-FWER). By aggregating across multiple randomized knockoff realizations, our approach mitigates the instability commonly observed with conventional knockoffs. Through extensive simulations, we demonstrate that our method consistently outperforms standard knockoffs in both selection power and error control. Moreover, we apply our procedure to a clinical dataset on primary biliary cirrhosis (PBC) to identify key prognostic biomarkers associated with patient survival. The results confirm the superior stability of the derandomized knockoffs method, allowing for a more reliable identification of important clinical variables. Additionally, our approach is applicable to datasets containing both continuous and categorical covariates, broadening its utility in real-world biomedical studies. This framework provides a robust and interpretable solution for high-dimensional survival analysis, making it particularly suitable for applications requiring precise and stable variable selection.
翻译:在高维生存分析中筛选重要特征对于识别验证性生物标志物并保持严格的错误控制至关重要。本文提出了一种用于Cox回归的去随机化Knockoffs方法,该方法在保持对k-族错误率(k-FWER)严格控制的同时,增强了特征选择的稳定性。通过聚合多个随机化Knockoffs实现的结果,我们的方法缓解了传统Knockoffs方法常见的不稳定性问题。通过大量模拟实验,我们证明该方法在筛选效能和错误控制方面均优于标准Knockoffs方法。此外,我们将该方法应用于原发性胆汁性胆管炎(PBC)的临床数据集,以识别与患者生存相关的关键预后生物标志物。结果证实了去随机化Knockoffs方法具有更优的稳定性,能够更可靠地识别重要临床变量。同时,我们的方法适用于包含连续型和分类型协变量的数据集,拓展了其在真实世界生物医学研究中的应用范围。该框架为高维生存分析提供了稳健且可解释的解决方案,特别适用于需要精确且稳定变量选择的应用场景。