Deep neural networks have been shown to perform well in many classical machine learning problems, especially in image classification tasks. However, researchers have found that neural networks can be easily fooled, and they are surprisingly sensitive to small perturbations imperceptible to humans. Carefully crafted input images (adversarial examples) can force a well-trained neural network to provide arbitrary outputs. Including adversarial examples during training is a popular defense mechanism against adversarial attacks. In this paper we propose a new defensive mechanism under the generative adversarial network (GAN) framework. We model the adversarial noise using a generative network, trained jointly with a classification discriminative network as a minimax game. We show empirically that our adversarial network approach works well against black box attacks, with performance on par with state-of-art methods such as ensemble adversarial training and adversarial training with projected gradient descent.
翻译:深神经网络在许多古典机器学习问题中表现良好,特别是在图像分类任务方面。然而,研究人员发现神经网络很容易被愚弄,而且对小扰动不为人知的小型神经网络特别敏感。精心设计的输入图像(对抗性实例)可以迫使训练有素的神经网络提供任意产出。培训中包括对抗性实例是防范对抗性攻击的大众防御机制。本文中我们提议在基因对抗性网络(GAN)框架下建立新的防御机制。我们用基因化网络模拟对抗性噪音,与分类歧视网络共同培训为迷你式游戏。我们从经验上表明,我们的对抗性网络对黑盒攻击非常有效,其表现与最先进的方法相同,如联合对抗性对抗性培训和与预测的梯度血统的对抗性训练。