The generation of feasible adversarial examples is necessary for properly assessing models that work on constrained feature space. However, it remains a challenging task to enforce constraints into attacks that were designed for computer vision. We propose a unified framework to generate feasible adversarial examples that satisfy given domain constraints. Our framework supports the use cases reported in the literature and can handle both linear and non-linear constraints. We instantiate our framework into two algorithms: a gradient-based attack that introduces constraints in the loss function to maximize, and a multi-objective search algorithm that aims for misclassification, perturbation minimization, and constraint satisfaction. We show that our approach is effective on two datasets from different domains, with a success rate of up to 100%, where state-of-the-art attacks fail to generate a single feasible example. In addition to adversarial retraining, we propose to introduce engineered non-convex constraints to improve model adversarial robustness. We demonstrate that this new defense is as effective as adversarial retraining. Our framework forms the starting point for research on constrained adversarial attacks and provides relevant baselines and datasets that future research can exploit.
翻译:生成可行的对抗性实例对于正确评估在受限制的特性空间上发挥作用的模型是必要的。然而,将限制强加到为计算机视野设计的攻击中,仍然是一项具有挑战性的任务。我们提出了一个统一的框架,以产生符合特定领域限制的可行的对抗性实例。我们的框架支持文献中报告的使用案例,并能够处理线性和非线性制约。我们立即将我们的框架变成两种算法:一种基于梯度的攻击,对损失功能施加限制,以最大化为目的,以及一种多目标的搜索算法,旨在错误分类、尽量减少扰动和限制满意度。我们表明,我们的方法对来自不同领域的两个数据集是有效的,成功率高达100%,最先进的攻击未能产生单一可行的例子。除了对抗性再培训外,我们提议采用工程设计的非convex限制来改进模型的对抗性强健性。我们证明,这种新的防御手段与对抗性再培训一样有效。我们的框架构成关于受限制的对抗性攻击的研究的起点,并提供未来研究可以利用的相关基线和数据集。