Saliency methods have been widely used to highlight important input features in model predictions. Most existing methods use backpropagation on a modified gradient function to generate saliency maps. Thus, noisy gradients can result in unfaithful feature attributions. In this paper, we tackle this issue and introduce a {\it saliency guided training}procedure for neural networks to reduce noisy gradients used in predictions while retaining the predictive performance of the model. Our saliency guided training procedure iteratively masks features with small and potentially noisy gradients while maximizing the similarity of model outputs for both masked and unmasked inputs. We apply the saliency guided training procedure to various synthetic and real data sets from computer vision, natural language processing, and time series across diverse neural architectures, including Recurrent Neural Networks, Convolutional Networks, and Transformers. Through qualitative and quantitative evaluations, we show that saliency guided training procedure significantly improves model interpretability across various domains while preserving its predictive performance.
翻译:测量方法被广泛用于突出模型预测中的重要投入特征。大多数现有方法都使用经修改的梯度函数的反向分析来生成显著的地图。 因此, 噪音梯度可能导致不忠的特征属性。 在本文中, 我们处理这一问题, 并引入神经网络的程序, 以减少预测中使用的噪音梯度, 同时保留模型的预测性能。 我们的显著指导培训程序以小型和潜在噪音梯度的迭代面罩特征, 同时尽量扩大隐蔽和无包装投入的模型输出的相似性。 我们对来自计算机视觉、 自然语言处理和不同神经结构的时间序列的各种合成和真实数据集, 包括经常性神经网络、 革命网络 和变异器。 我们通过定性和定量评估, 显示显著的指导培训程序极大地改进了不同领域的模型解释性,同时保持其预测性能。