Research on adversarial examples in computer vision tasks has shown that small, often imperceptible changes to an image can induce misclassification, which has security implications for a wide range of image processing systems. Considering $L_2$ norm distortions, the Carlini and Wagner attack is presently the most effective white-box attack in the literature. However, this method is slow since it performs a line-search for one of the optimization terms, and often requires thousands of iterations. In this paper, an efficient approach is proposed to generate gradient-based attacks that induce misclassifications with low $L_2$ norm, by decoupling the direction and the norm of the adversarial perturbation that is added to the image. Experiments conducted on the MNIST, CIFAR-10 and ImageNet datasets indicate that our attack achieves comparable results to the state-of-the-art (in terms of $L_2$ norm) with considerably fewer iterations (as few as 100 iterations), which opens the possibility of using these attacks for adversarial training. Models trained with our attack achieve state-of-the-art robustness against white-box gradient-based $L_2$ attacks on the MNIST and CIFAR-10 datasets, outperforming the Madry defense when the attacks are limited to a maximum norm.
翻译:计算机视觉任务中的对抗性实例研究表明,对图像进行小规模的、往往无法察觉到的改变可能导致错误分类,从而对广泛的图像处理系统产生安全影响。考虑到2美元的标准扭曲,Carlini和Wagner袭击目前是文献中最有效的白箱袭击。然而,这一方法缓慢,因为它为优化的一个条件进行线上搜索,而且往往需要数千倍的迭代。在本文中,建议一种有效的方法,通过解开作为图像补充的对抗性渗透的方向和规范,产生基于梯度的袭击,以低值L2美元的标准导致错误分类。在MNIST、CIF-10和图像网络数据集上进行的实验表明,我们的袭击取得了与最新技术(按2美元标准计算)相近的结果,其反复次数要少得多(只有100倍),这为利用这些攻击进行低值的对抗性培训提供了可能性。通过我们所培训的模型,我们攻击达到最强的状态性渗透性渗透性渗透性袭击,而美国空军在以美元为基的基底的钢架上进行最强性攻击。