Recent pathological foundation models have substantially advanced visual representation learning and multimodal interaction. However, most models still rely on a static inference paradigm in which whole-slide images are processed once to produce predictions, without reassessment or targeted evidence acquisition under ambiguous diagnoses. This contrasts with clinical diagnostic workflows that refine hypotheses through repeated slide observations and further examination requests. We propose PathFound, an agentic multimodal model designed to support evidence-seeking inference in pathological diagnosis. PathFound integrates the power of pathological visual foundation models, vision-language models, and reasoning models trained with reinforcement learning to perform proactive information acquisition and diagnosis refinement by progressing through the initial diagnosis, evidence-seeking, and final decision stages. Across several large multimodal models, adopting this strategy consistently improves diagnostic accuracy, indicating the effectiveness of evidence-seeking workflows in computational pathology. Among these models, PathFound achieves state-of-the-art diagnostic performance across diverse clinical scenarios and demonstrates strong potential to discover subtle details, such as nuclear features and local invasions.
翻译:近期病理基础模型在视觉表征学习和多模态交互方面取得了显著进展。然而,大多数模型仍依赖于静态推理范式,即全切片图像经一次性处理生成预测结果,在诊断不明确时缺乏重新评估或针对性证据获取机制。这与临床诊断工作流程形成鲜明对比——后者通过重复切片观察和进一步检查请求来完善假设。我们提出PathFound,一种专为支持病理诊断中循证推理而设计的智能多模态模型。PathFound整合了病理视觉基础模型、视觉语言模型以及强化学习训练推理模型的能力,通过初始诊断、证据搜寻和最终决策三个阶段推进,实现主动信息获取与诊断优化。在多个大型多模态模型中应用该策略均能持续提升诊断准确率,证实了循证工作流程在计算病理学中的有效性。其中,PathFound在多样化临床场景中实现了最先进的诊断性能,并展现出发现细微特征(如细胞核特征与局部浸润)的强大潜力。