In recent years, large language models (LLMs) have achieved substantial advancements and are increasingly integrated into critical applications across various domains. This growing adoption underscores the need to ensure their security and robustness. In this work, we focus on the impact of Bit Flip Attacks (BFAs) on LLMs, which exploits hardware faults to corrupt model parameters, posing a significant threat to model integrity and performance. Existing studies on BFA against LLMs adopt a progressive bit-search strategy that predominantly relies on gradient-based techniques to identify sensitive layers or weights. However, computing gradients comes with two specific challenges: First, in the context of LLMs, it increases computational and memory costs exponentially, and Second, it requires access to a sample victim dataset or knowledge of the victim domain to compute the gradient. In this work, we investigate beyond the scope of attack efficacy and aim to develop an efficient, practical Gradient-Data-free Bit-Flip Attack. The challenge lies in the core principle of adversarial attacks, which relies heavily on computing gradients from sample test/train data and manipulating model weights based on gradient information. To overcome this, we propose novel vulnerability index metrics that can identify vulnerable weight bits in LLMs independent of any gradient or data knowledge. By removing the dependency on gradient computation, our approach drastically reduces memory requirements and scales efficiently across multiple tasks with constant complexity. Experimental results demonstrate the efficiency of our method, requiring as few as a single bit flip to achieve adversarial objectives for five open-source LLMs.
翻译:近年来,大型语言模型(LLMs)取得了显著进展,并日益广泛应用于各领域的关键应用中。这种日益增长的采用凸显了确保其安全性与鲁棒性的必要性。在本研究中,我们重点关注位翻转攻击(BFAs)对LLMs的影响,该攻击利用硬件故障破坏模型参数,对模型完整性和性能构成重大威胁。现有针对LLMs的BFA研究采用渐进式位搜索策略,主要依赖基于梯度的技术来识别敏感层或权重。然而,梯度计算面临两个特定挑战:首先,在LLMs背景下,它会指数级增加计算和内存成本;其次,它需要访问样本受害者数据集或了解受害者领域以计算梯度。在本工作中,我们超越攻击效能的范畴,致力于开发一种高效、实用的无梯度-数据位翻转攻击。挑战在于对抗攻击的核心原则严重依赖于从样本测试/训练数据计算梯度,并基于梯度信息操纵模型权重。为克服此问题,我们提出了新颖的脆弱性指标度量,能够在无需任何梯度或数据知识的情况下识别LLMs中的脆弱权重位。通过消除对梯度计算的依赖,我们的方法大幅降低了内存需求,并以恒定复杂度高效扩展到多任务场景。实验结果表明,我们的方法仅需翻转单个位即可实现对五个开源LLMs的攻击目标,展现了卓越的效率。