The class activation mapping, or CAM, has been the cornerstone of feature attribution methods for multiple vision tasks. Its simplicity and effectiveness have led to wide applications in the explanation of visual predictions and weakly-supervised localization tasks. However, CAM has its own shortcomings. The computation of attribution maps relies on ad-hoc calibration steps that are not part of the training computational graph, making it difficult for us to understand the real meaning of the attribution values. In this paper, we improve CAM by explicitly incorporating a latent variable encoding the location of the cue for recognition in the formulation, thereby subsuming the attribution map into the training computational graph. The resulting model, class activation latent mapping, or CALM, is trained with the expectation-maximization algorithm. Our experiments show that CALM identifies discriminative attributes for image classifiers more accurately than CAM and other visual attribution baselines. CALM also shows performance improvements over prior arts on the weakly-supervised object localization benchmarks. Our code is available at https://github.com/naver-ai/calm.
翻译:班级启动映射是多重视觉任务特性归属方法的基石。 班级启动映射( CAM) 的简单性和有效性导致在解释视觉预测和受微弱监督的本地化任务时广泛应用。 但是, CAM 本身也有其缺点。 分配映射的计算依靠非培训计算图的一部分的自动加热校准步骤, 使我们难以理解归属值的真正含义。 在本文中, 我们通过明确纳入隐性变量, 将标语位置编码在配方中识别信号的位置进行改进, 从而将归属映射纳入培训计算图中。 由此产生的模型、 班级启动潜在映射( 或 CALM ), 即 CALM 被培训为期待- 最大化算法。 我们的实验显示, CALM 为图像分类师确定了比 CAM 和其他视觉归属基线更准确的区别性属性。 CALM 还展示了先前艺术在弱势对象本地化基准方面的性改进。 我们的代码可在 https://github.com/naver-ai/calm 上查阅 。