Adversarial examples are carefully crafted attack points that are supposed to fool machine learning classifiers. In the last years, the field of adversarial machine learning, especially the study of perturbation-based adversarial examples, in which a perturbation that is not perceptible for humans is added to the images, has been studied extensively. Adversarial training can be used to achieve robustness against such inputs. Another type of adversarial examples are invariance-based adversarial examples, where the images are semantically modified such that the predicted class of the model does not change, but the class that is determined by humans does. How to ensure robustness against this type of adversarial examples has not been explored yet. This work addresses the impact of adversarial training with invariance-based adversarial examples on a convolutional neural network (CNN). We show that when adversarial training with invariance-based and perturbation-based adversarial examples is applied, it should be conducted simultaneously and not consecutively. This procedure can achieve relatively high robustness against both types of adversarial examples. Additionally, we find that the algorithm used for generating invariance-based adversarial examples in prior work does not correctly determine the labels and therefore we use human-determined labels.
翻译:反对立实例是精心设计的用来愚弄机器学习分类师的进攻点。在过去几年中,对立机器学习领域,特别是以扰动为基础的对抗性例子的研究,已经进行了广泛的研究,其中将人类无法察觉到的扰动性例子添加到图像中。对立培训可用于针对这种投入实现稳健性。另一类对抗性例子是基于逆差的对抗性例子,其中图像是静态的修改,使模型的预测类别没有变化,但人类决定的类别却有变化。如何确保针对这种类型的对抗性例子的稳健性研究尚未得到探讨。这项工作解决了对抗性训练与基于差异的对抗性例子对动态神经网络(CNN)的影响。我们表明,在应用基于差异和以扰动为基础的对抗性对抗性例子的对抗性训练时,应当同时进行,而不是连续进行。这一程序对于两种类型的对抗性例子,都能够实现相对较高的稳健性。此外,我们发现,在先前的标签中,我们没有正确使用以对抗性标签为基础的矩阵。