We propose a new attribution method for neural networks developed using first principles of causality (to the best of our knowledge, the first such). The neural network architecture is viewed as a Structural Causal Model, and a methodology to compute the causal effect of each feature on the output is presented. With reasonable assumptions on the causal structure of the input data, we propose algorithms to efficiently compute the causal effects, as well as scale the approach to data with large dimensionality. We also show how this method can be used for recurrent neural networks. We report experimental results on both simulated and real datasets showcasing the promise and usefulness of the proposed algorithm.
翻译:我们提出了一种新的神经网络归属方法。 我们提出了一种新的神经网络归属方法,该方法的开发首先基于因果关系原则(据我们所知,第一个原则)。神经网络结构被视为一个结构性因果模型,并提出了计算每个特性对产出的因果关系的方法。根据对输入数据因果结构的合理假设,我们提出了算法,以有效计算因果效应,并用大维度衡量数据的方法。我们还说明了该方法如何用于经常性神经网络。我们报告了模拟和真实数据集的实验结果,显示了提议的算法的希望和有用性。