We propose new, more efficient targeted white-box attacks against deep neural networks. Our attacks better align with the attacker's goal: (1) tricking a model to assign higher probability to the target class than to any other class, while (2) staying within an $\epsilon$-distance of the attacked input. First, we demonstrate a loss function that explicitly encodes (1) and show that Auto-PGD finds more attacks with it. Second, we propose a new attack method, Constrained Gradient Descent (CGD), using a refinement of our loss function that captures both (1) and (2). CGD seeks to satisfy both attacker objectives -- misclassification and bounded $\ell_{p}$-norm -- in a principled manner, as part of the optimization, instead of via ad hoc post-processing techniques (e.g., projection or clipping). We show that CGD is more successful on CIFAR10 (0.9--4.2%) and ImageNet (8.6--13.6%) than state-of-the-art attacks while consuming less time (11.4--18.8%). Statistical tests confirm that our attack outperforms others against leading defenses on different datasets and values of $\epsilon$.
翻译:我们建议对深心神经网络进行新的、更高效的白箱袭击。我们的袭击更符合攻击者的目标:(1) 操纵一种模型,给目标类别分配的概率高于其他任何类别的概率,(2) 在被攻击输入的距离内停留在$\psilon$-距离内。首先,我们展示了一个明确编码(1)的丢失功能,并显示Auto-PGD发现更多的攻击。第二,我们提出一种新的攻击方法,即受限制的梯子(CGD),使用一种既捕捉(1)又捕捉(1)和(2)的损失功能的完善。CGD力求既满足攻击者目标 -- -- 错误分类和捆绑的$\ell ⁇ p}-norm -- -- 两种目标,作为优化的一部分,而不是通过临时的后处理技术(例如,投射或剪贴)实现。我们表明CGD在CIFAR10 (0.9-4.2%)和图像网(8.6-13.6%)方面比在消耗较少时间的情况下,采用一种状态式攻击(11.4-18.8%)。统计测试证实我们的攻击超越了不同的防御数据。