ECG-LLM——面向心电图学的领域特定大语言模型的训练与评估 (ECG-LLM-- training and evaluation of domain-specific large language models for electrocardiography)

Domain-adapted open-weight large language models (LLMs) offer promising healthcare applications, from queryable knowledge bases to multimodal assistants, with the crucial advantage of local deployment for privacy preservation. However, optimal adaptation strategies, evaluation methodologies, and performance relative to general-purpose LLMs remain poorly characterized. We investigated these questions in electrocardiography, an important area of cardiovascular medicine, by finetuning open-weight models on domain-specific literature and implementing a multi-layered evaluation framework comparing finetuned models, retrieval-augmented generation (RAG), and Claude Sonnet 3.7 as a representative general-purpose model. Finetuned Llama 3.1 70B achieved superior performance on multiple-choice evaluations and automatic text metrics, ranking second to Claude 3.7 in LLM-as-a-judge assessments. Human expert evaluation favored Claude 3.7 and RAG approaches for complex queries. Finetuned models significantly outperformed their base counterparts across nearly all evaluation modes. Our findings reveal substantial performance heterogeneity across evaluation methodologies, underscoring assessment complexity. Nevertheless, domain-specific adaptation through finetuning and RAG achieves competitive performance with proprietary models, supporting the viability of privacy-preserving, locally deployable clinical solutions.

翻译：领域适配的开源权重大语言模型在医疗健康领域展现出广阔的应用前景，从可查询知识库到多模态助手，其关键优势在于可本地部署以保护隐私。然而，其最优适配策略、评估方法以及与通用大语言模型相比的性能表现仍缺乏充分研究。我们以心血管医学的重要领域——心电图学为研究对象，通过在领域特定文献上微调开源权重模型，并实施多层评估框架，比较了微调模型、检索增强生成以及作为代表性通用模型的Claude Sonnet 3.7。微调后的Llama 3.1 70B模型在多项选择题评估和自动文本指标上表现优异，在LLM-as-a-judge评估中仅次于Claude 3.7。人类专家评估则更倾向于Claude 3.7和检索增强生成方法处理复杂查询。微调模型在几乎所有评估模式中都显著优于其基础版本。我们的研究结果揭示了不同评估方法间显著的性能异质性，凸显了评估的复杂性。尽管如此，通过微调和检索增强生成实现的领域特定适配，能够与专有模型达到竞争性性能，这支持了保护隐私、可本地部署的临床解决方案的可行性。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日