Objective: Develop a cost-effective, large language model (LLM)-based pipeline for automatically extracting Review of Systems (ROS) entities from clinical notes. Materials and Methods: The pipeline extracts ROS section from the clinical note using SecTag header terminology, followed by few-shot LLMs to identify ROS entities such as diseases or symptoms, their positive/negative status and associated body systems. We implemented the pipeline using 4 open-source LLM models: llama3.1:8b, gemma3:27b, mistral3.1:24b and gpt-oss:20b. Additionally, we introduced a novel attribution algorithm that aligns LLM-identified ROS entities with their source text, addressing non-exact and synonymous matches. The evaluation was conducted on 24 general medicine notes containing 340 annotated ROS entities. Results: Open-source LLMs enable a local, cost-efficient pipeline while delivering promising performance. Larger models like Gemma, Mistral, and Gpt-oss demonstrate robust performance across three entity recognition tasks of the pipeline: ROS entity extraction, negation detection and body system classification (highest F1 score = 0.952). With the attribution algorithm, all models show improvements across key performance metrics, including higher F1 score and accuracy, along with lower error rate. Notably, the smaller Llama model also achieved promising results despite using only one-third the VRAM of larger models. Discussion and Conclusion: From an application perspective, our pipeline provides a scalable, locally deployable solution to easing the ROS documentation burden. Open-source LLMs offer a practical AI option for resource-limited healthcare settings. Methodologically, our newly developed algorithm facilitates accuracy improvements for zero- and few-shot LLMs in named entity recognition.
翻译:目的:开发一种经济高效、基于大语言模型(LLM)的自动化流程,用于从临床记录中提取系统回顾(ROS)实体。材料与方法:该流程首先利用SecTag标题术语从临床记录中提取ROS章节,随后采用小样本LLM识别ROS实体(如疾病或症状)、其阳性/阴性状态以及关联的躯体系统。我们使用4个开源LLM模型实现了该流程:llama3.1:8b、gemma3:27b、mistral3.1:24b和gpt-oss:20b。此外,我们提出了一种新颖的归因算法,将LLM识别的ROS实体与其源文本对齐,以处理非精确匹配和同义匹配问题。评估在包含340个标注ROS实体的24份普通内科记录上进行。结果:开源LLM支持构建本地化、经济高效的流程,同时展现出良好性能。Gemma、Mistral和Gpt-oss等较大模型在流程的三个实体识别任务中均表现出稳健性能:ROS实体提取、否定检测和躯体系统分类(最高F1分数=0.952)。通过归因算法,所有模型在关键性能指标上均有所提升,包括更高的F1分数与准确率,以及更低的错误率。值得注意的是,较小的Llama模型尽管仅使用大型模型三分之一的显存,也取得了令人满意的结果。讨论与结论:从应用视角看,本流程为减轻ROS记录负担提供了可扩展的本地化部署方案。开源LLM为资源有限的医疗环境提供了实用的人工智能选择。在方法论层面,我们新开发的算法有效提升了零样本与小样本LLM在命名实体识别任务中的准确率。