Adversarial attacks have threatened the application of deep neural networks in security-sensitive scenarios. Most existing black-box attacks fool the target model by interacting with it many times and producing global perturbations. However, global perturbations change the smooth and insignificant background, which not only makes the perturbation more easily be perceived but also increases the query overhead. In this paper, we propose a novel framework to perturb the discriminative areas of clean examples only within limited queries in black-box attacks. Our framework is constructed based on two types of transferability. The first one is the transferability of model interpretations. Based on this property, we identify the discriminative areas of a given clean example easily for local perturbations. The second is the transferability of adversarial examples. It helps us to produce a local pre-perturbation for improving query efficiency. After identifying the discriminative areas and pre-perturbing, we generate the final adversarial examples from the pre-perturbed example by querying the targeted model with two kinds of black-box attack techniques, i.e., gradient estimation and random search. We conduct extensive experiments to show that our framework can significantly improve the query efficiency during black-box perturbing with a high attack success rate. Experimental results show that our attacks outperform state-of-the-art black-box attacks under various system settings.
翻译:反向攻击威胁了深心神经网络在安全敏感情景中的应用。 大多数现有的黑箱攻击通过多次与目标模型进行互动并产生全球扰动,从而欺骗了目标模型。 然而,全球扰动改变了光滑和微不足道的背景,这不仅使得更容易察觉扰动,而且增加了查询的间接费用。在本文中,我们提出了一个新的框架,在黑箱攻击的有限查询中,只能对有区别的清洁实例领域进行扰动。我们的框架基于两种类型的可转移性来构建。第一个框架是模型解释的可转移性。基于这一属性,我们确定了一个易于本地扰动的清晰示例的受歧视领域。第二个是对抗性实例的可转移性。它帮助我们为改进查询效率而制作一个局部扰动前的预扰动性框架。在确定有区别的黑箱攻击中,我们通过两种类型的黑箱攻击技术,即黑梯度估计和随机搜索来生成最后的对抗性实例。我们在一次袭击中进行大规模测试,在一次袭击中,通过一次高度的测试中,我们通过一种高度攻击率的测试来展示我们的成功框架。