XBench：胸部放射影像中视觉-语言解释的综合基准 (XBench: A Comprehensive Benchmark for Visual-Language Explanations in Chest Radiography)

Vision-language models (VLMs) have recently shown remarkable zero-shot performance in medical image understanding, yet their grounding ability, the extent to which textual concepts align with visual evidence, remains underexplored. In the medical domain, however, reliable grounding is essential for interpretability and clinical adoption. In this work, we present the first systematic benchmark for evaluating cross-modal interpretability in chest X-rays across seven CLIP-style VLM variants. We generate visual explanations using cross-attention and similarity-based localization maps, and quantitatively assess their alignment with radiologist-annotated regions across multiple pathologies. Our analysis reveals that: (1) while all VLM variants demonstrate reasonable localization for large and well-defined pathologies, their performance substantially degrades for small or diffuse lesions; (2) models that are pretrained on chest X-ray-specific datasets exhibit improved alignment compared to those trained on general-domain data. (3) The overall recognition ability and grounding ability of the model are strongly correlated. These findings underscore that current VLMs, despite their strong recognition ability, still fall short in clinically reliable grounding, highlighting the need for targeted interpretability benchmarks before deployment in medical practice. XBench code is available at https://github.com/Roypic/Benchmarkingattention

翻译：视觉-语言模型（VLMs）近期在医学图像理解任务中展现出卓越的零样本性能，然而其基础能力——即文本概念与视觉证据的对齐程度——仍未得到充分探索。在医学领域，可靠的基础能力对于模型可解释性及临床实际应用至关重要。本研究首次提出了系统性基准，用于评估七种CLIP风格VLM变体在胸部X光片中的跨模态可解释性。我们通过交叉注意力机制与基于相似性的定位图生成视觉解释，并定量评估其与放射科医生标注的多种病理区域之间的对齐程度。分析结果表明：（1）尽管所有VLM变体对大型且边界清晰的病理区域均表现出合理的定位能力，但其对小型或弥散性病灶的性能显著下降；（2）基于胸部X光专用数据集预训练的模型，相较于通用领域数据训练的模型，展现出更优的对齐效果；（3）模型的整体识别能力与其基础能力呈现强相关性。这些发现表明，尽管当前VLM具备强大的识别能力，但在临床可靠的基础能力方面仍存在不足，这凸显了在医疗实践部署前建立针对性可解释性基准的必要性。XBench代码已发布于 https://github.com/Roypic/Benchmarkingattention

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日