This paper reviews recent studies in emerging directions of understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations. Although deep neural networks have exhibited superior performance in various tasks, the interpretability is always an Achilles' heel of deep neural networks. At present, deep neural networks obtain a high discrimination power at the cost of low interpretability of their black-box representations. We believe that the high model interpretability may help people to break several bottlenecks of deep learning, e.g., learning from very few annotations, learning via human-computer communications at the semantic level, and semantically debugging network representations. In this paper, we focus on convolutional neural networks (CNNs), and we revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends of explainable artificial intelligence.
翻译:本文回顾了关于理解神经网络的新兴方向和学习神经网络以及可解释/分解的中层代表的最新研究。虽然深神经网络在各种任务方面表现优异,但解释性始终是深神经网络的致命环节。目前,深神经网络以低解黑盒代表的低解析为代价获得高度歧视力量。我们认为,高模型可解释性可能有助于人们打破深层次学习的几个瓶颈,例如,从极少的几个说明中学习,通过语义层面的人类计算机通信学习,以及语义解调网络代表。在本文中,我们侧重于共导神经网络,我们重新审视CNN的视觉表现、受过训练的CNN的辨别方法、对受过训练的CNN的介绍进行分解的方法、对受过训练的CNN的介绍进行分解、对有分解的表达的CNN进行学习,以及基于模型可解释性的中端学习。最后,我们讨论了可解释的人工智能的未来趋势。