When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, geologists, architects, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training, meaning that there are no labels for parts of images. We demonstrate our method on the CUB-200-2011 dataset and the CBIS-DDSM dataset. Our experiments show that our interpretable network can achieve comparable accuracy with its analogous standard non-interpretable counterpart as well as other interpretable deep models.
翻译:当我们面临具有挑战性的图像分类任务时,我们常常通过解剖图像来解释我们的推理,并指出一个或另一个类的原型方面。每个类的越来越多的证据有助于我们作出最终决定。在这项工作中,我们引入了一个以类似方式说明原因的深网络结构:网络通过寻找原型部件将图像分解,并将原型中的证据组合起来进行最终分类。因此,模型的理由与鸟类学家、医生、地质学家、建筑师和其他人在质量上相似,可以向人们解释如何解决具有挑战性的图像分类任务。网络只使用图像级别标签用于培训,这意味着没有图像部分的标签。我们在CUB-200-2011数据集和CBIS-DSM数据集上展示了我们的方法。我们的实验表明,我们可解释的网络可以与类似的标准、不可解释的对应方以及其他可解释的深层模型实现相似的准确性。