Clinical case reports encode temporal patient trajectories that are often underexploited by traditional machine learning methods relying on structured data. In this work, we introduce the forecasting problem from textual time series, where timestamped clinical findings -- extracted via an LLM-assisted annotation pipeline -- serve as the primary input for prediction. We systematically evaluate a diverse suite of models, including fine-tuned decoder-based large language models and encoder-based transformers, on tasks of event occurrence prediction, temporal ordering, and survival analysis. Our experiments reveal that encoder-based models consistently achieve higher F1 scores and superior temporal concordance for short- and long-horizon event forecasting, while fine-tuned masking approaches enhance ranking performance. In contrast, instruction-tuned decoder models demonstrate a relative advantage in survival analysis, especially in early prognosis settings. Our sensitivity analyses further demonstrate the importance of time ordering, which requires clinical time series construction, as compared to text ordering, the format of the text inputs that LLMs are classically trained on. This highlights the additional benefit that can be ascertained from time-ordered corpora, with implications for temporal tasks in the era of widespread LLM use.
翻译:临床病例报告编码了时间维度的患者轨迹,这些信息在依赖结构化数据的传统机器学习方法中往往未被充分利用。本研究提出了基于文本时间序列的预测问题,其中通过LLM辅助标注流程提取的时间戳临床发现作为预测的主要输入。我们在事件发生预测、时序排序和生存分析任务上系统评估了多种模型,包括基于解码器的微调大语言模型和基于编码器的Transformer模型。实验表明,在短期和长期事件预测任务中,基于编码器的模型始终获得更高的F1分数和更优的时间一致性,而微调掩码方法则提升了排序性能。相比之下,指令微调的解码器模型在生存分析(尤其是早期预后场景)中展现出相对优势。敏感性分析进一步揭示了时间排序(需要构建临床时间序列)相较于文本排序(LLM经典训练所用的文本输入格式)的重要性。这凸显了时间有序语料库所能带来的附加价值,对LLM广泛应用时代的时序任务具有重要启示。