Interim analyses are vital in clinical trials for early decision-making. While frequentist implications are well-established, the consequences of repeated Bayesian interim monitoring for efficacy, specifically regarding multiplicity, remain contentious. This article provides theoretical justification and numerical evidence evaluating the impact of such designs on bias, mean squared error (MSE), credible interval coverage, false discovery rate (FDR), and average Type I error (ATIE). Our findings show that when the inferential prior matches the data-generating prior, sequential efficacy stopping does not bias the posterior mean or degrade credible interval coverage. However, even under this ``matched" condition, the FDR, ATIE, and MSE are significantly altered. In the more practically relevant scenario where the inferential and data-generating priors differ, all aforementioned operating characteristics, including estimation bias and coverage, are substantially impacted. These results reconcile long-standing conflicting arguments regarding Bayesian multiplicity. We demonstrate that while some Bayesian properties are invariant to sequential looks, others are not. Our work underscores the necessity of thoughtful prior specification and comprehensive evaluation of frequentist-Bayesian operating characteristics to ensure reliable inference in adaptive trial designs.
翻译:期中分析在临床试验中对于早期决策至关重要。尽管频率学派的含义已得到充分确立,但重复贝叶斯期中监测对于疗效的影响,特别是关于多重性问题,仍然存在争议。本文提供了理论依据和数值证据,评估此类设计对偏差、均方误差(MSE)、可信区间覆盖度、错误发现率(FDR)以及平均第一类错误(ATIE)的影响。我们的研究结果表明,当推断先验与数据生成先验相匹配时,序贯疗效停止不会使后验均值产生偏差或降低可信区间覆盖度。然而,即使在这种“匹配”条件下,FDR、ATIE和MSE也会发生显著改变。在推断先验与数据生成先验不同的更实际相关场景中,所有上述操作特性,包括估计偏差和覆盖度,都会受到实质性影响。这些结果调和了关于贝叶斯多重性长期存在的矛盾论点。我们证明,虽然某些贝叶斯性质对序贯观察保持不变,但其他性质则不然。我们的工作强调了在适应性试验设计中,必须审慎指定先验并全面评估频率学派-贝叶斯操作特性,以确保可靠的推断。