魔鬼在GAN: 保护深创模型防止后门攻击 (The Devil is in the GAN: Defending Deep Generative Models Against Backdoor Attacks)

Deep Generative Models (DGMs) allow users to synthesize data from complex, high-dimensional manifolds. Industry applications of DGMs include data augmentation to boost performance of (semi-)supervised machine learning, or to mitigate fairness or privacy concerns. Large-scale DGMs are notoriously hard to train, requiring expert skills, large amounts of data and extensive computational resources. Thus, it can be expected that many enterprises will resort to sourcing pre-trained DGMs from potentially unverified third parties, e.g.~open source model repositories. As we show in this paper, such a deployment scenario poses a new attack surface, which allows adversaries to potentially undermine the integrity of entire machine learning development pipelines in a victim organization. Specifically, we describe novel training-time attacks resulting in corrupted DGMs that synthesize regular data under normal operations and designated target outputs for inputs sampled from a trigger distribution. Depending on the control that the adversary has over the random number generation, this imposes various degrees of risk that harmful data may enter the machine learning development pipelines, potentially causing material or reputational damage to the victim organization. Our attacks are based on adversarial loss functions that combine the dual objectives of attack stealth and fidelity. We show its effectiveness for a variety of DGM architectures (Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs)) and data domains (images, audio). Our experiments show that - even for large-scale industry-grade DGMs - our attack can be mounted with only modest computational efforts. We also investigate the effectiveness of different defensive approaches (based on static/dynamic model and output inspections) and prescribe a practical defense strategy that paves the way for safe usage of DGMs.

翻译：深度生成模型(DGM) 使用户能够将来自复杂、高维的多维元体的数据综合起来。 DGM 的行业应用包括数据增强,以提高(半)监督的机器学习绩效,或减轻公平或隐私问题。大规模DGM 很难培训,需要专家技能、大量数据和大量计算资源。因此,可以预计许多企业将诉诸可能未经核实的第三方(例如:~开放源码模型库)获得经过预先培训的DGM。正如我们在本文件中所表明的那样,这种部署情景构成了一个新的攻击性表面,使对手有可能破坏(半)监督的机器学习机器学习的机器学习,或减轻对公平或隐私的关切。具体而言,我们描述了新的培训时间攻击性攻击,导致在正常操作中将常规数据与从触发分布中抽取的投入指定目标产出相结合的腐败性DGMMs。取决于对手对随机数字生成的控制,这给有害数据进入机器学习发展管道带来不同程度的风险,有可能对受害者组织造成物质或声誉损害。我们的攻击性- 数字- DVG 网络以双向性格式化的DMDMD(我们攻击性结构的模型的模型的模型的模型的功能功能) 显示双向损失功能。