One of the current key challenges in Explainable AI is in correctly interpreting activations of hidden neurons. It seems evident that accurate interpretations thereof would provide insights into the question what a deep learning system has internally \emph{detected} as relevant on the input, thus lifting some of the black box character of deep learning systems. The state of the art on this front indicates that hidden node activations appear to be interpretable in a way that makes sense to humans, at least in some cases. Yet, systematic automated methods that would be able to first hypothesize an interpretation of hidden neuron activations, and then verify it, are mostly missing. In this paper, we provide such a method and demonstrate that it provides meaningful interpretations. It is based on using large-scale background knowledge -- a class hierarchy of approx. 2 million classes curated from the Wikipedia Concept Hierarchy -- together with a symbolic reasoning approach called \emph{concept induction} based on description logics that was originally developed for applications in the Semantic Web field. Our results show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network through a hypothesis and verification process.
翻译:在可解释的AI中,目前的关键挑战之一是正确解释隐藏神经元的激活。 似乎显而易见, 准确的解释将使人们能洞察到一个问题, 一个深层次的学习系统在输入上有什么内在的\ emph{ 检测到}, 从而提升深层次学习系统的某些黑盒特性。 前沿的艺术状态表明, 隐藏节点激活似乎可以对人类有意义地解释, 至少在某些情况下是这样。 然而, 系统自动化方法, 能够首先虚度对隐藏神经元激活的解读, 然后核实它, 大部分是缺失的。 在本文中, 我们提供这样一种方法, 并证明它提供了有意义的解释。 它基于使用大规模的背景知识 -- -- 大约为200万个类的等级, 从维基百科概念高度分级中归纳出来 -- 加上一种象征性的推理方法, 称为 emph{ 概念感应感应。 根据最初在Smantic网站字段中开发的描述逻辑, 我们的结果显示, 可以自动从背景知识中将一个有意义的标签附加到一个在密层的神经变图中。