Although deep convolutional neural networks achieve state-of-the-art performance across nearly all image classification tasks, their decisions are difficult to interpret. One approach that offers some level of interpretability by design is \textit{hard attention}, which uses only relevant portions of the image. However, training hard attention models with only class label supervision is challenging, and hard attention has proved difficult to scale to complex datasets. Here, we propose a novel hard attention model, which we term Saccader. Key to Saccader is a pretraining step that requires only class labels and provides initial attention locations for policy gradient optimization. Our best models narrow the gap to common ImageNet baselines, achieving $75\%$ top-1 and $91\%$ top-5 while attending to less than one-third of the image.
翻译:尽管深相神经网络在几乎所有图像分类任务中都取得了最先进的性能,但其决定很难解释。一种通过设计提供某种程度可解释性的方法是只使用图像相关部分的\ textit{hard attention}。然而,培训只使用类标签监督的难看模型是困难的,而且很难关注复杂的数据集。在这里,我们提出了一个新颖的难看模型,我们称之为Scacder。Sacader的钥匙是一个训练前步骤,只需要等级标签,并且提供政策梯度优化的初步关注点。我们的最佳模型将差距缩小到通用图像网络基线,达到最高1级75美元和最高5级911美元,同时关注不到图像的三分之一。