The interpretability of medical image analysis models is considered a key research field. We use a dataset of eye-tracking data from five radiologists to compare the outputs of interpretability methods and the heatmaps representing where radiologists looked. We conduct a class-independent analysis of the saliency maps generated by two methods selected from the literature: Grad-CAM and attention maps from an attention-gated model. For the comparison, we use shuffled metrics, which avoid biases from fixation locations. We achieve scores comparable to an interobserver baseline in one shuffled metric, highlighting the potential of saliency maps from Grad-CAM to mimic a radiologist's attention over an image. We also divide the dataset into subsets to evaluate in which cases similarities are higher.
翻译:模型的可解释性被认为是医学图像分析研究的一个关键领域。我们利用五名放射科医生的眼动数据集,比较了解释性方法生成的输出和放射科医生看过的部位的热力图。我们对来自文献中选择的两种方法的显著性热力图的非类依赖性进行分析:Grad-CAM 和基于注意力的模型的注意图。为了比较,我们使用了洗牌度量(shuffled metrics),避免了来自固定点位置的偏差。在一种洗牌度量中,我们达到了与观察者间基线相当的分数,凸显了 Grad-CAM 生成的显著性热力图模仿一个放射科医生在图像上的注意力的潜力。我们还将数据集分成子集,评估相似度更高的情况。