Deep neural networks (DNNs) are known to be vulnerable to adversarial examples/attacks, raising concerns about their reliability in safety-critical applications. A number of defense methods have been proposed to train robust DNNs resistant to adversarial attacks, among which adversarial training has so far demonstrated the most promising results. However, recent studies have shown that there exists an inherent tradeoff between accuracy and robustness in adversarially-trained DNNs. In this paper, we propose a novel technique Dual Head Adversarial Training (DH-AT) to further improve the robustness of existing adversarial training methods. Different from existing improved variants of adversarial training, DH-AT modifies both the architecture of the network and the training strategy to seek more robustness. Specifically, DH-AT first attaches a second network head (or branch) to one intermediate layer of the network, then uses a lightweight convolutional neural network (CNN) to aggregate the outputs of the two heads. The training strategy is also adapted to reflect the relative importance of the two heads. We empirically show, on multiple benchmark datasets, that DH-AT can bring notable robustness improvements to existing adversarial training methods. Compared with TRADES, one state-of-the-art adversarial training method, our DH-AT can improve the robustness by 3.4% against PGD40 and 2.3% against AutoAttack, and also improve the clean accuracy by 1.8%.
翻译:众所周知,深心神经网络(DNNS)容易受到对抗性实例/攻击的伤害,这引起了人们对现有对抗性培训方法的可靠性的担忧。一些国防方法建议培训抗对抗性攻击的强健的DNNS,其中对抗性培训迄今已经显示出最有希望的成果。然而,最近的研究表明,在经过对抗性训练的DNNS中,在精确性和稳健性之间存在着内在的权衡关系。在本文件中,我们建议采用新的技术双头对冲培训(DH-40 AAT),以进一步提高现有对抗性培训方法的稳健性。与现有的对抗性对抗性培训的改进不同的是,DH-AT改变网络的结构和培训战略以寻求更稳健性。具体地说,DH-AT首先将第二个网络(或分支)连接到网络的一个中间层,然后使用轻量的革命神经网络(CNN)来汇总两个头的产出。培训战略也进行了调整,以反映出两个头的相对重要性。我们用多种基准数据模型显示,DH-AT改进了网络的相对性培训方法可以使D-D-DRAT方法与VA的稳健健性改进。