CNS-Obsidian：基于科学文献构建的神经外科视觉语言模型 (CNS-Obsidian: A Neurosurgical Vision-Language Model Built From Scientific Publications)

Anton Alyakin,Jaden Stryker,Daniel Alexander Alber,Jin Vivian Lee,Karl L. Sangwon,Brandon Duderstadt,Akshay Save,David Kurland,Spencer Frome,Shrutika Singh,Jeff Zhang,Eunice Yang,Ki Yun Park,Cordelia Orillac,Aly A. Valliani,Sean Neifert,Albert Liu,Aneek Patel,Christopher Livia,Darryl Lau,Ilya Laufer,Peter A. Rozman,Eveline Teresa Hidalgo,Howard Riina,Rui Feng,Todd Hollon,Yindalon Aphinyanaphongs,John G. Golfinos,Laura Snyder,Eric Leuthardt,Douglas Kondziolka,Eric Karl Oermann

General-purpose VLMs demonstrate impressive capabilities, but their opaque training on uncurated internet data poses critical limitations for high-stakes decision-making, such as in neurosurgery. We present CNS-Obsidian, a neurosurgical VLM trained on peer-reviewed literature, and demonstrate its clinical utility versus GPT-4o in a real-world setting. We compiled 23,984 articles from Neurosurgery Publications journals, yielding 78,853 figures and captions. Using GPT-4o and Claude Sonnet-3.5, we converted these into 263,064 training samples across three formats: instruction fine-tuning, multiple-choice questions, and differential diagnosis. We trained CNS-Obsidian, a fine-tune of the 34-billion parameter LLaVA-Next model. In a blinded, randomized trial at NYU Langone Health (Aug 30-Nov 30, 2024), neurosurgery consultations were assigned to either CNS-Obsidian or a HIPAA-compliant GPT-4o endpoint as diagnostic co-pilot after consultations. Primary outcomes were diagnostic helpfulness and accuracy, assessed via user ratings and presence of correct diagnosis within the VLM-provided differential. CNS-Obsidian matched GPT-4o on synthetic questions (76.13% vs 77.54%, p=0.235), but only achieved 46.81% accuracy on human-generated questions versus GPT-4o's 65.70% (p<10-15). In the randomized trial, 70 consultations were evaluated (32 CNS-Obsidian, 38 GPT-4o) from 959 total consults (7.3% utilization). CNS-Obsidian received positive ratings in 40.62% of cases versus 57.89% for GPT-4o (p=0.230). Both models included correct diagnosis in approximately 60% of cases (59.38% vs 65.79%, p=0.626). Domain-specific VLMs trained on curated scientific literature can approach frontier model performance despite being orders of magnitude smaller and less expensive to train. This establishes a transparent framework for scientific communities to build specialized AI models.

翻译：通用视觉语言模型（VLM）展现出令人印象深刻的能力，但其基于未经筛选的互联网数据进行的不透明训练，对于神经外科等高风险决策场景存在关键限制。我们提出了CNS-Obsidian，一个基于同行评议文献训练的神经外科VLM，并在真实临床环境中对比了其与GPT-4o的临床效用。我们收集了《Neurosurgery Publications》期刊的23,984篇文章，得到78,853个图表及标题。利用GPT-4o和Claude Sonnet-3.5，我们将其转换为涵盖三种格式的263,064个训练样本：指令微调、多项选择题和鉴别诊断。我们训练了CNS-Obsidian，该模型是基于340亿参数LLaVA-Next模型的微调版本。在纽约大学朗格尼健康中心（2024年8月30日至11月30日）进行的一项盲法随机试验中，神经外科会诊后被随机分配使用CNS-Obsidian或符合HIPAA标准的GPT-4o端点作为诊断辅助工具。主要结局指标为诊断帮助性和准确性，通过用户评分及VLM提供的鉴别诊断列表中是否包含正确诊断进行评估。CNS-Obsidian在合成问题上与GPT-4o表现相当（76.13% vs 77.54%，p=0.235），但在人工生成问题上仅达到46.81%的准确率，而GPT-4o为65.70%（p<10⁻¹⁵）。在随机试验中，从959次总会诊（使用率7.3%）中评估了70次会诊（32次CNS-Obsidian，38次GPT-4o）。CNS-Obsidian在40.62%的案例中获得积极评分，而GPT-4o为57.89%（p=0.230）。两种模型在约60%的案例中包含了正确诊断（59.38% vs 65.79%，p=0.626）。尽管规模小数个数量级且训练成本显著更低，基于精选科学文献训练的领域特定VLM仍能接近前沿模型的性能。这为科学界构建专用人工智能模型建立了一个透明框架。

相关内容

中国神经科学学会

关注 0

中国神经科学学会（CNS）是由全国的科研、教学和医院等单位中的神经科学工作者组成的，具有独立法人资格的非营利性社会团体。自2016年起，学会开始致力于神经科学学科引领和学术战略规划。2016-2018年完成了中国科协《神经科学方向预测与技术路线图》项目和《生命科学领域前沿跟踪研究》项目，并且已经由科学出版社正式出版，2020年完成了《神经科学和类脑人工智能发展-新进展新趋势》。2020-2021年还将完成《我国类脑智能产业与技术发展路线图研究》和《科技经济融合发展-智能细胞制造科技创新与产业发展战略研究》。2020年开始学会将每年开展评选年度“中国神经科学重大进展”。中国神经科学学会年会即全国学术会议，是我国神经科学领域规模最大、学术水平最高的学术会议。从2021年开始，改为一年一次，并且与海内外华人神经科学家研讨会结合在一起。学会下属专业分会每年召开形式多样、内容丰富的学术会议和培训班，促进了神经科学领域的学术交流和合作。

[ICCV2025]EAMamba：面向图像恢复的高效全能视觉状态空间模型

专知会员服务

5+阅读 · 7月1日

【NeurIPS2022】SparCL:边缘稀疏持续学习

专知会员服务

24+阅读 · 2022年9月22日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

借助几何先验知识促进深度神经网络：综述 | Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey

专知会员服务

29+阅读 · 2020年7月10日