In this paper, we uniquely study the adversarial robustness of deep neural networks (NN) for classification tasks against that of optimal classifiers. We look at the smallest magnitude of possible additive perturbations that can change a classifier's output. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural networks for classification. In particular, our theoretical results show that a neural network's adversarial robustness can degrade as the input dimension $d$ increases. Analytically, we show that neural networks' adversarial robustness can be only $1/\sqrt{d}$ of the best possible adversarial robustness of optimal classifiers. Our theories match remarkably well with numerical experiments of practically trained NN, including NN for ImageNet images. The matrix-theoretic explanation is consistent with an earlier information-theoretic feature-compression-based explanation for the adversarial fragility of neural networks.
翻译:本文独到地研究了深度神经网络(NN)在分类任务中的对抗鲁棒性,并与最优分类器进行对比。我们考察了能够改变分类器输出的最小可能加性扰动幅度。我们为深度神经网络在分类中的对抗脆弱性提供了矩阵理论解释。特别地,我们的理论结果表明,随着输入维度$d$的增加,神经网络的对抗鲁棒性可能下降。通过分析,我们证明神经网络的对抗鲁棒性可能仅为最优分类器最佳可能对抗鲁棒性的$1/\sqrt{d}$。我们的理论与实际训练的神经网络(包括用于ImageNet图像的神经网络)的数值实验高度吻合。该矩阵理论解释与先前基于信息论的特征压缩解释对神经网络对抗脆弱性的阐述相一致。