The existence of adversarial data examples has drawn significant attention in the deep-learning community; such data are seemingly minimally perturbed relative to the original data, but lead to very different outputs from a deep-learning algorithm. Although a significant body of work on developing defense models has been developed, most such models are heuristic and are often vulnerable to adaptive attacks. Defensive methods that provide theoretical robustness guarantees have been studied intensively, yet most fail to obtain non-trivial robustness when a large-scale model and data are present. To address these limitations, we introduce a framework that is scalable and provides certified bounds on the norm of the input manipulation for constructing adversarial examples. We establish a connection between robustness against adversarial perturbation and additive random noise, and propose a training strategy that can significantly improve the certified bounds. Our evaluation on MNIST, CIFAR-10 and ImageNet suggests that our method is scalable to complicated models and large data sets, while providing competitive robustness to state-of-the-art provable defense methods.
翻译:在深层学习群体中,对抗性数据实例的存在引起了人们的极大注意;这些数据与原始数据相比似乎很少受到干扰,但从深层学习算法中得出的结果却大相径庭。虽然已经开发了大量关于开发防御模型的工作,但大多数这类模型都是累赘的,而且往往容易受到适应性攻击的影响。对提供理论稳健性保障的防御性方法进行了深入的研究,但在存在大规模模型和数据时,大多数都未能获得非三维的稳健性。为了解决这些局限性,我们引入了一个可扩展的框架,为构建对抗性实例的投入操作规范提供了经认证的界限。我们建立了对对抗性扰动的强健性和添加随机噪音之间的联系,并提出了一个能够大大改进经认证的界限的培训战略。我们对MNIST、CIFAR-10和图像网络的评估表明,我们的方法可以对复杂的模型和大型数据集进行扩缩,同时为最先进的防御方法提供竞争性强力。