Corrupted training data are ubiquitous. Corrective Machine Unlearning (CMU) seeks to remove the influence of such corruption post-training. Prior CMU typically assumes access to identified corrupted training samples (a ``forget set''). However, in many real-world scenarios the training data are no longer accessible. We formalize \emph{source-free} CMU, where the original training data are unavailable and, consequently, no forget set of identified corrupted training samples can be specified. Instead, we assume a small proxy (surrogate) set of corrupted samples that reflect the suspected corruption type without needing to be the original training samples. In this stricter setting, methods relying on forget set are ineffective or narrow in scope. We introduce \textit{Corrective Unlearning in Task Space} (CUTS), a lightweight weight space correction method guided by the proxy set using task arithmetic principles. CUTS treats the clean and the corruption signal as distinct tasks. Specifically, we briefly fine-tune the corrupted model on the proxy to amplify the corruption mechanism in the weight space, compute the difference between the corrupted and fine-tuned weights as a proxy task vector, and subtract a calibrated multiple of this vector to cancel the corruption. Without access to clean data or a forget set, CUTS recovers a large fraction of the lost utility under label noise and, for backdoor triggers, nearly eliminates the attack with minimal damage to utility, outperforming state-of-the-art specialized CMU methods in source-free setting.
翻译:训练数据中的污染现象普遍存在。纠正性机器遗忘旨在训练后消除此类污染的影响。以往的CMU方法通常假设能够获取已识别的污染训练样本(即“遗忘集”)。然而,在许多实际场景中,原始训练数据已无法访问。本文形式化提出了无源CMU问题,即原始训练数据不可用,因而无法指定由已识别污染样本构成的遗忘集。取而代之的是,我们假设存在一个反映疑似污染类型的小型代理(替代)污染样本集,且无需其为原始训练样本。在此更严格的设定下,依赖遗忘集的方法效果有限或适用范围狭窄。我们提出任务空间中的纠正性遗忘,这是一种基于代理集、利用任务算术原理指导的轻量级权重空间校正方法。CUTS将干净信号与污染信号视为不同的任务。具体而言,我们首先在代理集上对污染模型进行短暂微调,以在权重空间中放大污染机制;接着计算污染模型与微调后模型的权重差异,将其作为代理任务向量;最后通过减去该向量的校准倍数来抵消污染。在无法访问干净数据或遗忘集的情况下,CUTS能在标签噪声场景中恢复大部分损失的性能,对于后门触发器则能以极小的性能代价近乎完全消除攻击,在无源设定下超越了当前最先进的专用CMU方法。