Robustness has become one of the most critical problems in machine learning (ML). The science of interpreting ML models to understand their behavior and improve their robustness is referred to as explainable artificial intelligence (XAI). One of the state-of-the-art XAI methods for computer vision problems is to generate saliency maps. A saliency map highlights the pixel space of an image that excites the ML model the most. However, this property could be misleading if spurious and salient features are present in overlapping pixel spaces. In this paper, we propose a caption-based XAI method, which integrates a standalone model to be explained into the contrastive language-image pre-training (CLIP) model using a novel network surgery approach. The resulting caption-based XAI model identifies the dominant concept that contributes the most to the models prediction. This explanation minimizes the risk of the standalone model falling for a covariate shift and contributes significantly towards developing robust ML models. Our code is available at <https://github.com/patch0816/caption-driven-xai>.
翻译:鲁棒性已成为机器学习(ML)中最关键的问题之一。解释ML模型以理解其行为并提升其鲁棒性的科学被称为可解释人工智能(XAI)。针对计算机视觉问题,最先进的XAI方法之一是生成显著性图。显著性图突出显示图像中激发ML模型最强烈的像素区域。然而,若虚假特征与显著特征在像素空间重叠,这一特性可能产生误导。本文提出一种基于文本描述的XAI方法,通过一种新颖的网络手术方法,将待解释的独立模型集成到对比语言-图像预训练(CLIP)模型中。由此产生的基于描述的XAI模型可识别对模型预测贡献最大的主导概念。这种解释方式最小化了独立模型陷入协变量偏移的风险,并对开发鲁棒的ML模型具有重要贡献。我们的代码发布于<https://github.com/patch0816/caption-driven-xai>。