In adversarial machine learning, deep neural networks can fit the adversarial examples on the training dataset but have poor generalization ability on the test set. This phenomenon is called robust overfitting, and it can be observed when adversarially training neural nets on common datasets, including SVHN, CIFAR-10, CIFAR-100, and ImageNet. In this paper, we study the robust overfitting issue of adversarial training by using tools from uniform stability. One major challenge is that the outer function (as a maximization of the inner function) is nonsmooth, so the standard technique (e.g., hardt et al., 2016) cannot be applied. Our approach is to consider $\eta$-approximate smoothness: we show that the outer function satisfies this modified smoothness assumption with $\eta$ being a constant related to the adversarial perturbation. Based on this, we derive stability-based generalization bounds for stochastic gradient descent (SGD) on the general class of $\eta$-approximate smooth functions, which covers the adversarial loss. Our results provide a different understanding of robust overfitting from the perspective of uniform stability. Additionally, we show that a few popular techniques for adversarial training (\emph{e.g.,} early stopping, cyclic learning rate, and stochastic weight averaging) are stability-promoting in theory.
翻译:在对抗性机器学习中,深神经网络可以适应培训数据集的对抗性实例,但测试集的概括能力较差。这种现象被称为强力过度装配,当在共同数据集,包括SVHN、CIFAR-10、CIFAR-100和图像网络上对神经网进行对抗性培训时,可以观察到这种现象。在本文中,我们通过使用统一稳定性的工具,研究激烈过度装配对抗性培训的问题。一个重大挑战是外部功能(由于内部功能最大化)不是光滑的,因此标准技术(例如硬等,2016年)无法应用。我们的方法是考虑美元/美元-近似光滑:我们显示外功能满足了这一经修改的光滑假设,美元/美元与对抗性扰动经常相关。在此基础上,我们得出了基于稳定性的概括性平坦度下降(SGD),在包括对抗性损失在内的一般类中,以美元/美元为近似光滑度函数(例如硬性等)无法应用。我们的方法是考虑美元/美元-直观性平滑度的理论,我们从一个稳定的早期稳定角度展示了稳定,我们从稳定的理论角度展示了一种稳定性平级学习的理论。我们从一种对平级的理论,从一个不同的理解,从一种稳定的视角显示了一种稳定性平比。