Clinical trials provide essential guidance for practicing Evidence-Based Medicine, though often accompanying with unendurable costs and risks. To optimize the design of clinical trials, we introduce a novel Clinical Trial Result Prediction (CTRP) task. In the CTRP framework, a model takes a PICO-formatted clinical trial proposal with its background as input and predicts the result, i.e. how the Intervention group compares with the Comparison group in terms of the measured Outcome in the studied Population. While structured clinical evidence is prohibitively expensive for manual collection, we exploit large-scale unstructured sentences from medical literature that implicitly contain PICOs and results as evidence. Specifically, we pre-train a model to predict the disentangled results from such implicit evidence and fine-tune the model with limited data on the downstream datasets. Experiments on the benchmark Evidence Integration dataset show that the proposed model outperforms the baselines by large margins, e.g., with a 10.7% relative gain over BioBERT in macro-F1. Moreover, the performance improvement is also validated on another dataset composed of clinical trials related to COVID-19.
翻译:临床试验为实践循证医学提供了必要的指导,尽管往往伴之以无法承受的成本和风险。为了优化临床试验的设计,我们引入了一个新的临床试验结果预测(CTRP)任务。在CTRP框架内,模型采用了以PICO为格式的临床试验提案,其背景为投入,并预测结果,即干预小组与比较小组在所研究人口计量结果方面的比较如何。尽管结构化临床证据对人工收集来说过于昂贵,但我们利用了含有PICO和结果的医学文献中的大规模非结构化判决作为证据。具体地说,我们预先设计了一个模型,以预测这些隐含证据的混乱结果,并用下游数据集的有限数据微调模型。关于基准证据整合数据集的实验表明,拟议的模型大大超越了基线,例如,在宏观-F1中比BERT高出10.7%的相对收益。此外,绩效改进还得到了另一数据集的验证,该数据集包括与COVID-19有关的临床试验。