预先训练的对抗干扰干扰 (Pre-trained Adversarial Perturbations)

Self-supervised pre-training has drawn increasing attention in recent years due to its superior performance on numerous downstream tasks after fine-tuning. However, it is well-known that deep learning models lack the robustness to adversarial examples, which can also invoke security issues to pre-trained models, despite being less explored. In this paper, we delve into the robustness of pre-trained models by introducing Pre-trained Adversarial Perturbations (PAPs), which are universal perturbations crafted for the pre-trained models to maintain the effectiveness when attacking fine-tuned ones without any knowledge of the downstream tasks. To this end, we propose a Low-Level Layer Lifting Attack (L4A) method to generate effective PAPs by lifting the neuron activations of low-level layers of the pre-trained models. Equipped with an enhanced noise augmentation strategy, L4A is effective at generating more transferable PAPs against fine-tuned models. Extensive experiments on typical pre-trained vision models and ten downstream tasks demonstrate that our method improves the attack success rate by a large margin compared with state-of-the-art methods.

翻译：近年来,由于经过微调后在众多下游任务上表现优异,自监督培训前的训练工作近年来引起越来越多的注意,然而,众所周知,深层次学习模式缺乏对对抗性实例的强力,而对抗性实例尽管探索较少,但也可以将安全问题引向预培训模式;在本文件中,我们深入研究预先培训的模型的稳健性,采用预先培训的反动(PAPs),这是为预先培训的模型所设计的普遍扰动,目的是在不了解下游任务的情况下攻击经过微调的模型时保持效力;为此,我们提议采用低层升压(L4A)方法,通过提升经过预先培训的模型的低层神经活动来产生有效的PAPs;采用强化噪音增强战略,L4A能够有效地产生更可转让的PAPs,对抗经过微调的模型。关于典型的预先培训的视觉模型和10项下游任务的广泛试验表明,我们的方法比州一级方法大大改进攻击成功率。

相关内容

MoDELS

关注 30

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

71+阅读 · 2022年6月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

44+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

46+阅读 · 2020年7月4日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日