Adversarial training is an effective defense method to protect classification models against adversarial attacks. However, one limitation of this approach is that it can require orders of magnitude additional training time due to high cost of generating strong adversarial examples during training. In this paper, we first show that there is high transferability between models from neighboring epochs in the same training process, i.e., adversarial examples from one epoch continue to be adversarial in subsequent epochs. Leveraging this property, we propose a novel method, Adversarial Training with Transferable Adversarial Examples (ATTA), that can enhance the robustness of trained models and greatly improve the training efficiency by accumulating adversarial perturbations through epochs. Compared to state-of-the-art adversarial training methods, ATTA enhances adversarial accuracy by up to 7.2% on CIFAR10 and requires 12~14x less training time on MNIST and CIFAR10 datasets with comparable model robustness.
翻译:反向培训是一种有效的防御方法,可以保护分类模式免遭对抗性攻击,但是,这一方法的一个局限性是,由于在培训期间产生强有力的对抗性实例的成本很高,可能需要大量额外培训时间,因为培训费用很高。在本文件中,我们首先表明,在同一培训过程中,相邻时代的模型之间有很大的可转移性,即一个时代的对抗性实例在随后的时代继续处于对立性。利用这一财产,我们提出一种新颖的方法,即具有可转让反向实例的反向培训(ATTA),它能够加强受过培训的模型的稳健性,通过通过通过不同阶段积累对抗性干扰来大大提高培训效率。与最先进的对抗性培训方法相比,ATTA在CIFAR10上提高对抗性准确性,最高达7.2%,要求在具有类似模型的MNIST和CIFAR10数据集方面减少12~14x的培训时间。