攻击愚人并解释深网络 (Attack to Fool and Explain Deep Networks)

Deep visual models are susceptible to adversarial perturbations to inputs. Although these signals are carefully crafted, they still appear noise-like patterns to humans. This observation has led to the argument that deep visual representation is misaligned with human perception. We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations. We first propose an attack that fools a network to confuse a whole category of objects (source class) with a target label. Our attack also limits the unintended fooling by samples from non-sources classes, thereby circumscribing human-defined semantic notions for network fooling. We show that the proposed attack not only leads to the emergence of regular geometric patterns in the perturbations, but also reveals insightful information about the decision boundaries of deep models. Exploring this phenomenon further, we alter the `adversarial' objective of our attack to use it as a tool to `explain' deep visual representation. We show that by careful channeling and projection of the perturbations computed by our method, we can visualize a model's understanding of human-defined semantic notions. Finally, we exploit the explanability properties of our perturbations to perform image generation, inpainting and interactive image manipulation by attacking adversarialy robust `classifiers'.In all, our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models. The article also makes secondary contributions in terms of establishing the utility of our attack beyond the adversarial objective with multiple interesting applications.

翻译：虽然这些信号是精心设计的,但它们仍然看起来像噪音一样的人类模式。这一观察使人们有理由认为,深刻的视觉表现方式与人类的观念不相符。我们通过在对抗性扰动中提供具有人类意图的模式的证据来反争论。我们首先建议进行攻击,愚弄一个网络,用一个目标标签来混淆整个类别的物体(源类),我们的攻击也限制了非源类样本无意地愚弄,从而限制了由人类定义的网络愚弄的语义概念。我们表明,拟议的攻击不仅导致在扰动中出现常规的几何模式,而且还揭示了有关深层模型决策界限的深刻信息。我们进一步探索了这个现象,我们改变了我们攻击的`对抗性'目标,将它作为一种工具来“解释”深刻的视觉表现。我们通过谨慎地引导和预测以我们的方法计算出的非源性攻击,我们可以想象出一个模型对人定义的二次攻击概念的理解,不仅导致在扰动中出现正常的几度模式,而且还揭示了关于深层模型的判断性定义。最后,我们利用了我们攻击的视觉形象的诠释性,我们利用了我们所有的视觉形象,从而确立了了我们图案的图象。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR2021】动态度量学习

专知会员服务

40+阅读 · 2021年3月30日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日