TCAV -- -- 神经网络中中间层的有力和有效解释 (Adversarial TCAV -- Robust and Effective Interpretation of Intermediate Layers in Neural Networks)

Interpreting neural network decisions and the information learned in intermediate layers is still a challenge due to the opaque internal state and shared non-linear interactions. Although (Kim et al, 2017) proposed to interpret intermediate layers by quantifying its ability to distinguish a user-defined concept (from random examples), the questions of robustness (variation against the choice of random examples) and effectiveness (retrieval rate of concept images) remain. We investigate these two properties and propose improvements to make concept activations reliable for practical use. Effectiveness: If the intermediate layer has effectively learned a user-defined concept, it should be able to recall --- at the testing step --- most of the images containing the proposed concept. For instance, we observed that the recall rate of Tiger shark and Great white shark from the ImageNet dataset with "Fins" as a user-defined concept was only 18.35% for VGG16. To increase the effectiveness of concept learning, we propose A-CAV --- the Adversarial Concept Activation Vector --- this results in larger margins between user concepts and (negative) random examples. This approach improves the aforesaid recall to 76.83% for VGG16. For robustness, we define it as the ability of an intermediate layer to be consistent in its recall rate (the effectiveness) for different random seeds. We observed that TCAV has a large variance in recalling a concept across different random seeds. For example, the recall of cat images (from a layer learning the concept of tail) varies from 18% to 86% with 20.85% standard deviation on VGG16. We propose a simple and scalable modification that employs a Gram-Schmidt process to sample random noise from concepts and learn an average "concept classifier". This approach improves the aforesaid standard deviation from 20.85% to 6.4%.

翻译：在解释神经网络决定和中间层所学信息时,由于内部状态不透明,且共享非线性互动,这仍然是一个挑战。虽然(Kim等人,2017年)提议通过量化其区分用户定义概念的能力来解释中间层(随机实例),但强性(相对于随机示例选择的变异)和有效性(概念图像的检索率)问题仍然存在。我们调查这两个属性并提出改进,以使概念启动可靠,以便实际使用。效果:如果中间层有效地学习了用户定义的离差概念,它应该能够回顾 -- -- 在测试阶段 -- -- 大多数包含拟议概念的尾部图像。例如,我们观察到,虎鲨和大白鲨的回溯率(相对于随机实例而言),“Fins”作为用户定义概念的回溯率只有18.35%。为了提高概念学习的实效,我们建议A-CAVAV -- -- Aversarial Acivation Vectoration -- -- 这个结果在用户概念和(负面)随机示例之间的大差间,应该回顾 -- 从测试 -- 多数显示-测试阶段 -- -- 包含拟议概念- 包含拟议概念的LILILV.