Training mental health clinicians to conduct standardized clinical assessments is challenging due to a lack of scalable, realistic practice opportunities, which can impact data quality in clinical trials. To address this gap, we introduce a voice-enabled virtual patient simulation system powered by a large language model (LLM). This study describes the system's development and validates its ability to generate virtual patients who accurately adhere to pre-defined clinical profiles, maintain coherent narratives, and produce realistic dialogue. We implemented a system using a LLM to simulate patients with specified symptom profiles, demographics, and communication styles. The system was evaluated by 5 experienced clinical raters who conducted 20 simulated structured MADRS interviews across 4 virtual patient personas. The virtual patients demonstrated strong adherence to their clinical profiles, with a mean item difference between rater-assigned MADRS scores and configured scores of 0.52 (SD=0.75). Inter-rater reliability across items was 0.90 (95% CI=0.68-0.99). Expert raters consistently rated the qualitative realism and cohesiveness of the virtual patients favorably, giving average ratings between "Agree" and "Strongly Agree." Our findings suggest that LLM-powered virtual patient simulations are a viable and scalable tool for training clinicians, capable of producing high-fidelity, clinically relevant practice scenarios.
翻译:培训心理健康临床医生进行标准化临床评估具有挑战性,主要原因是缺乏可扩展且逼真的实践机会,这可能影响临床试验的数据质量。为弥补这一不足,我们引入了一个由大语言模型(LLM)驱动的语音交互式虚拟患者模拟系统。本研究描述了该系统的开发过程,并验证了其生成虚拟患者的能力,这些虚拟患者能够准确遵循预定义的临床特征、保持连贯的叙事并产生逼真的对话。我们实现了一个使用LLM来模拟具有特定症状特征、人口统计学信息和沟通风格患者的系统。该系统由5名经验丰富的临床评估员进行了评估,他们使用4种虚拟患者角色完成了20次模拟的标准化MADRS访谈。虚拟患者表现出对其临床特征的严格遵守,评估员分配的MADRS分数与预设分数之间的平均项目差异为0.52(SD=0.75)。项目间的评估者间信度为0.90(95% CI=0.68-0.99)。专家评估员一致对虚拟患者的定性真实性和连贯性给予积极评价,平均评分介于“同意”与“非常同意”之间。我们的研究结果表明,基于LLM的虚拟患者模拟是一种可行且可扩展的临床医生培训工具,能够生成高保真度且具有临床相关性的实践场景。