Although deep neural networks have been widely applied in many application domains, they are found to be vulnerable to adversarial attacks. A recent promising set of attacking techniques have been proposed, which mainly focus on generating adversarial examples under digital-world settings. Such strategies are unfortunately not implementable for any physical-world scenarios such as autonomous driving. In this paper, we present FragGAN, a new GAN-based framework which is capable of generating an adversarial image which differs from the original input image only through replacing a targeted fragment within the image using a corresponding visually indistinguishable adversarial fragment. FragGAN ensures that the resulting entire image is effective in attacking. For any physical-world implementation, an attacker could physically print out the adversarial fragment and then paste it onto the original fragment (e.g., a roadside sign for autonomous driving scenarios). FragGAN also enables clean-label attacks against image classification, as the resulting attacks may succeed even without modifying any essential content of an image. Extensive experiments including physical-world case studies on state-of-the-art autonomous steering and image classification models demonstrate that FragGAN is highly effective and superior to simple extensions of existing approaches. To the best of our knowledge, FragGAN is the first approach that can implement effective and clean-label physical-world attacks.
翻译:虽然许多应用领域广泛应用了深心神经网络,但发现它们很容易受到对抗性攻击。最近提出了一套很有希望的攻击技术,主要侧重于在数字世界环境中产生对抗性实例。不幸的是,这种战略不能应用于任何物理世界情景,例如自主驾驶。本文介绍了一个新的以GAN为基础的框架FragGAN,这个框架能够产生一种与原始输入图像不同的对抗性图像,它只能通过使用相应的视觉无法辨别的对立碎片来取代图像中的目标碎片。FragGAN确保由此产生的整个图像能够有效地攻击。对于任何物理世界的实施来说,攻击者可以打印出对抗性碎片,然后将其粘贴在原始碎片上(例如,自主驾驶情景的路边标志)。 FragGAN还能够对图像分类进行干净的标签攻击,因为由此产生的攻击即使不修改图像的任何基本内容,也可能成功。广泛的实验,包括对州-艺术自主指导和图像分类模型进行物理世界案例研究,表明,对于任何物理世界执行来说,FragGAN最有效、最清洁的物理扩展方法就是我们目前最清洁的FRAGAN。