GPU clouds have become a popular computing platform because of the cost of owning and maintaining high-performance computing clusters. Many cloud architectures have also been proposed to ensure a secure execution environment for guest applications by enforcing strong security policies to isolate the untrusted hypervisor from the guest virtual machines (VMs). In this paper, we study the impact of GPU chip's hardware faults on the security of cloud "trusted" execution environment using Deep Neural Network (DNN) as the underlying application. We show that transient hardware faults of GPUs can be generated by exploiting the Dynamic Voltage and Frequency Scaling (DVFS) technology, and these faults may cause computation errors, but they have limited impact on the inference accuracy of DNN due to the robustness and fault-tolerant nature of well-developed DNN models. To take full advantage of these transient hardware faults, we propose the Lightning attack to locate the fault injection targets of DNNs and to control the fault injection precision in terms of timing and position. We conduct experiments on three commodity GPUs to attack four widely-used DNNs. Experimental results show that the proposed attack can reduce the inference accuracy of the models by as high as 78.3\% and 64.5\% on average. More importantly, 67.9\% of the targeted attacks have successfully misled the models to give our desired incorrect inference result. This demonstrates that the secure isolation on GPU clouds is vulnerable against transient hardware faults and the computation results may not be trusted.
翻译:由于拥有和维护高性能计算群集的成本,GPU云已经成为流行的计算平台。许多云层结构也已经提出,通过实施强有力的安全政策,将不信任的超高视镜与客用虚拟机器(VMs)隔离开来,确保客用应用的安全执行环境。在本文中,我们研究了GPU芯片的硬件缺陷对云“受信任”执行环境安全的影响,使用深神经网络(DNN)作为应用程序的基础。我们表明,利用动态伏压和频率增强技术(DVFS)可以造成GPU的瞬时性硬件故障,而这些故障可能造成计算错误,但是由于精心开发的DNNM模型的坚固性和容错性,它们对DNN的推断准确性影响有限。为了充分利用这些“受信任”的云层,我们建议轻度攻击以找到DNNW的过错注射目标,并控制在时间和位置上的错误输入精确性。我们对三种商品的GPOVS进行实验,以四种广泛使用DNFS(DVFS)技术进行攻击,但不会造成计算错误错误错误,但是,由于完善GNNNNM3的准确性攻击的准确性攻击可能显示,因此G的准确性攻击的准确性攻击。