特征净化:反向培训如何表现强力深层学习 (Feature Purification: How Adversarial Training Performs Robust Deep Learning)

Despite the empirical success of using Adversarial Training to defend deep learning models against adversarial perturbations, so far, it still remains rather unclear what the principles are behind the existence of adversarial perturbations, and what adversarial training does to the neural network to remove them. In this paper, we present a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network; and more importantly, one of the goals of adversarial training is to remove such mixtures to purify hidden weights. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly initialized gradient descent indeed satisfies this principle. Technically, we give, to the best of our knowledge, the first result proving that the following two can hold simultaneously for training a neural network with ReLU activation. (1) Training over the original data is indeed non-robust to small adversarial perturbations of some radius. (2) Adversarial training, even with an empirical perturbation algorithm such as FGM, can in fact be provably robust against ANY perturbations of the same radius. Finally, we also prove a complexity lower bound, showing that low complexity models such as linear classifiers, low-degree polynomials, or even the neural tangent kernel for this network, CANNOT defend against perturbations of this same radius, no matter what algorithms are used to train them.

翻译：尽管在使用Adversarial培训来保护深层次学习模型以对抗性扰动方面取得了经验性的成功,但迄今为止,对抗性培训的目标之一仍然是消除这种混合物以净化隐藏的重量。我们在CIFAR-10数据集上提出实验以说明这一原则,并用理论结果证明,对于某些自然分类任务,我们用随机初始化梯度下降来训练双层神经网络,使用ReLU激活,这确实符合这一原则。技术上,我们从理论上讲,将某些小密度混合物累积在神经网络培训过程中的隐藏重量中;更重要的是,对抗性培训的目标之一是消除这种混合物以净化隐藏的重量。我们在CIFAR-10数据集上提出实验以说明这一原则,并用理论结果来证明,对于某些自然分类任务,我们用随机初始化梯度下降的梯度下降来训练一个双层神经网络。从技术上讲,我们把以下两种低层次的精度混合物同时用于训练一个神经网络,而RELU的精度的精度的精度网络。(1) 对原始数据的训练确实性数据是非腐蚀性, 也证明,这种精确的精度的精确性,这种精确的精确度是用来的实验性结构。