Multimodal clinical reasoning in the field of gastrointestinal (GI) oncology necessitates the integrated interpretation of endoscopic imagery, radiological data, and biochemical markers. Despite the evident potential exhibited by Multimodal Large Language Models (MLLMs), they frequently encounter challenges such as context dilution and hallucination when confronted with intricate, heterogeneous medical histories. In order to address these limitations, a hierarchical Multi-Agent Framework is proposed, which emulates the collaborative workflow of a human Multidisciplinary Team (MDT). The system attained a composite expert evaluation score of 4.60/5.00, thereby demonstrating a substantial improvement over the monolithic baseline. It is noteworthy that the agent-based architecture yielded the most substantial enhancements in reasoning logic and medical accuracy. The findings indicate that mimetic, agent-based collaboration provides a scalable, interpretable, and clinically robust paradigm for automated decision support in oncology.
翻译:胃肠道肿瘤领域的多模态临床推理需要对内窥镜图像、放射学数据和生化标志物进行综合解读。尽管多模态大语言模型展现出显著潜力,但在处理复杂、异质的病史时,常面临上下文稀释和幻觉等挑战。为应对这些局限,本文提出一种分层多智能体框架,该框架模拟人类多学科团队的协作工作流程。该系统获得了4.60/5.00的综合专家评估分数,相较于单体基线模型实现了显著提升。值得注意的是,基于智能体的架构在推理逻辑和医学准确性方面带来了最显著的改进。研究结果表明,拟态化的智能体协作机制为肿瘤学自动化决策支持提供了一个可扩展、可解释且临床稳健的范式。