iSeal：基于加密指纹识别的可靠大语言模型所有权验证 (iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification)

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model thief fully controls the LLM's inference process. In such settings, attackers may share prompt-response pairs to enable fingerprint unlearning or manipulate outputs to evade exact-match verification. We propose iSeal, the first fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner. It injects unique features into both the model and an external module, reinforced by an error-correction mechanism and a similarity-based verification strategy. These components are resistant to verification-time attacks, including collusion-based fingerprint unlearning and response manipulation, backed by both theoretical analysis and empirical results. iSeal achieves 100 percent Fingerprint Success Rate (FSR) on 12 LLMs against more than 10 attacks, while baselines fail under unlearning and response manipulations.

翻译：鉴于从头训练大语言模型（LLM）的高昂成本，保护LLM知识产权（IP）已变得日益关键。作为IP所有权验证的标准范式，LLM指纹识别技术因此在应对这一挑战中发挥着至关重要的作用。现有的LLM指纹识别方法通过提取或注入模型特定特征来验证所有权，但它们忽略了验证过程中可能遭受的攻击，导致当模型窃取者完全控制LLM推理过程时，这些方法将失效。在此类场景下，攻击者可能共享提示-响应对以实现指纹遗忘，或操纵输出以规避精确匹配验证。我们提出了iSeal，这是首个专为在模型窃取者以端到端方式控制可疑LLM时实现可靠验证的指纹识别方法。该方法将独特特征同时注入模型和外部模块，并通过纠错机制和基于相似度的验证策略进行强化。这些组件能够抵抗验证阶段的攻击，包括基于共谋的指纹遗忘和响应操纵，其有效性得到了理论分析和实证结果的支持。iSeal在12个LLM上针对超过10种攻击实现了100%的指纹成功率（FSR），而基线方法在遗忘和响应操纵攻击下均告失败。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日