The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of estimates of Cox model log relative hazards, and there has been some work estimating pure risk. Yet there are few examples of these options in the medical literature, and we could not find programs currently online to analyze these various options. We therefore present a unified approach and R software to facilitate such analyses. We used influence functions adapted to the various design and analysis options together with variance calculations that take the two-phase sampling into account. This work clarifies when the widely used "robust" variance estimate of Barlow is appropriate. The corresponding R software, CaseCohortCoxSurvival, facilitates analysis with and without stratification and/or weight calibration, for subcohort sampling with or without replacement. We also allow for phase-two data to be missing at random for stratified designs. We provide inference not only for log relative hazards in the Cox model, but also for cumulative baseline hazards and covariate-specific pure risks. We hope these calculations and software will promote wider use of more efficient and principled design and analysis options for case-cohort studies.
翻译:案例队列设计仅在病例和整个队列中的随机样本(亚队列)上获得完整的协变量数据。随后的出版物描述了使用分层和加权校准来提高Cox模型对数相对危险度估计的效率,以及一些估计的纯风险。但是在医学文献中没有多少这些选项的例子,我们也无法找到目前在线分析这些不同选项的程序。因此,我们提出了一种统一的方法和R软件,以便促进这些分析。我们使用了适应于不同设计和分析选项的影响函数,以及考虑两阶段抽样的方差计算。此工作阐明了Barlow广泛使用的“鲁棒”方差估计何时适用。相应的R软件,CaseCohortCoxSurvival,可用于有或没有分层和/或加权校准的亚队列抽样,有或没有替换。对于分层设计,我们还允许第二阶段的数据随机缺失。我们不仅提供Cox模型中的对数相对风险的推断,还提供累积基线危险和协变量特定的纯风险。我们希望这些计算和软件将促进更广泛地使用更有效和基于原理的案例队列研究设计和分析选项。