Text-to-image (T2I) diffusion models enable high-quality image generation conditioned on textual prompts. However, fine-tuning these pre-trained models for personalization raises concerns about unauthorized dataset usage. To address this issue, dataset ownership verification (DOV) has recently been proposed, which embeds watermarks into fine-tuning datasets via backdoor techniques. These watermarks remain dormant on benign samples but produce owner-specified outputs when triggered. Despite its promise, the robustness of DOV against copyright evasion attacks (CEA) remains unexplored. In this paper, we investigate how adversaries can circumvent these mechanisms, enabling models trained on watermarked datasets to bypass ownership verification. We begin by analyzing the limitations of potential attacks achieved by backdoor removal, including TPD and T2IShield. In practice, TPD suffers from inconsistent effectiveness due to randomness, while T2IShield fails when watermarks are embedded as local image patches. To this end, we introduce CEAT2I, the first CEA specifically targeting DOV in T2I diffusion models. CEAT2I consists of three stages: (1) motivated by the observation that T2I models converge faster on watermarked samples with respect to intermediate features rather than training loss, we reliably detect watermarked samples; (2) we iteratively ablate tokens from the prompts of detected samples and monitor feature shifts to identify trigger tokens; and (3) we apply a closed-form concept erasure method to remove the injected watermarks. Extensive experiments demonstrate that CEAT2I effectively evades state-of-the-art DOV mechanisms while preserving model performance. The code is available at https://github.com/csyufei/CEAT2I.
翻译:文本到图像(T2I)扩散模型能够根据文本提示生成高质量图像。然而,针对这些预训练模型进行个性化微调引发了关于未经授权使用数据集的担忧。为解决此问题,近期提出了数据集所有权验证(DOV)方法,该方法通过后门技术将水印嵌入微调数据集中。这些水印在良性样本上保持休眠状态,但在触发时会产生所有者指定的输出。尽管前景广阔,但DOV针对版权规避攻击(CEA)的鲁棒性尚未得到探索。本文研究了攻击者如何规避这些机制,使得在带水印数据集上训练的模型能够绕过所有权验证。我们首先分析了通过后门移除实现的潜在攻击(包括TPD和T2IShield)的局限性。在实践中,TPD因随机性导致效果不稳定,而T2IShield在水印以局部图像块形式嵌入时会失效。为此,我们提出了CEAT2I,这是首个专门针对T2I扩散模型中DOV的CEA方法。CEAT2I包含三个阶段:(1)基于观察到T2I模型在带水印样本上相对于中间特征(而非训练损失)收敛更快的现象,我们可靠地检测带水印样本;(2)我们迭代地从检测样本的提示中消融令牌,并通过监控特征偏移来识别触发令牌;(3)我们应用闭式概念擦除方法来移除注入的水印。大量实验表明,CEAT2I在保持模型性能的同时,能有效规避最先进的DOV机制。代码发布于 https://github.com/csyufei/CEAT2I。