The security of deep neural networks (DNNs) has attracted increasing attention due to their widespread use in various applications. Recently, the deployed DNNs have been demonstrated to be vulnerable to Trojan attacks, which manipulate model parameters with bit flips to inject a hidden behavior and activate it by a specific trigger pattern. However, all existing Trojan attacks adopt noticeable patch-based triggers (e.g., a square pattern), making them perceptible to humans and easy to be spotted by machines. In this paper, we present a novel attack, namely hardly perceptible Trojan attack (HPT). HPT crafts hardly perceptible Trojan images by utilizing the additive noise and per pixel flow field to tweak the pixel values and positions of the original images, respectively. To achieve superior attack performance, we propose to jointly optimize bit flips, additive noise, and flow field. Since the weight bits of the DNNs are binary, this problem is very hard to be solved. We handle the binary constraint with equivalent replacement and provide an effective optimization algorithm. Extensive experiments on CIFAR-10, SVHN, and ImageNet datasets show that the proposed HPT can generate hardly perceptible Trojan images, while achieving comparable or better attack performance compared to the state-of-the-art methods. The code is available at: https://github.com/jiawangbai/HPT.
翻译:深心神经网络(DNNs)的安全由于在各种应用中广泛使用而引起越来越多的关注。最近,部署的DNNs由于在各种应用中广泛使用而引起越来越多的注意。最近,部署的DNS被证明很容易受到Trojan攻击,Trojan攻击将模型参数使用比特翻转来注入隐藏的行为并用特定的触发模式激活它。然而,所有现有的Trojan攻击都采用明显的基于补丁的触发器(例如方形),使这些触发器为人所能看见,并且很容易被机器发现。在本文中,我们提出了一个新的攻击,即难以察觉的Trojan攻击(HPT)。HPT手工艺几乎看不到Trojan的图像。HPTrojan图像流场使用添加噪音和每平方平流场来分别调整原始图像的定位和位置。为了达到更高的攻击性能,我们建议联合优化PTropps、添加噪音和流场。由于DPNS的重量部分是二进位,所以很难解决这个问题。我们用同等的替换方式处理二进制,并提供有效的优化算法。在CIFAR-10、SVHNPD/TroPS-roPS-roPS-commas 将产生更好的攻击方法。