Model merging aims to integrate task-specific abilities from individually fine-tuned models into a single model without extra training. In recent model merging methods, task vector has become a fundamental building block, as it can encapsulate the residual information from finetuning. However, the merged model often suffers from notable performance degradation due to the conflicts caused by task-irrelevant redundancy in task vectors. Existing efforts in overcoming redundancy by randomly dropping elements in the parameter space involves randomness and lacks knowledge awareness. To address these challenges, in this study, we propose Purifying TAsk Vectors (PAVE) in knowledge-aware subspace. Concretely, we sample some training examples from each task, and feed them into their corresponding fine-tuned models to acquire the covariance matrices before linear layers. We then perform a context-oriented singular value decomposition, which accentuates the weight components most relevant to the target knowledge. As a result, we can split fine-tuned model weights into task-relevant and redundant components in the knowledge-aware subspace, and purify the task vector by pruning the redundant components. To induce fair pruning efforts across models, we further introduce a spectral rank allocation strategy by optimizing a normalized activated pruning error. The task vector purification by our method as a plug-and-play scheme is applicable across various task vector-based merging methods to improve their performance. In experiments, we demonstrate the effectiveness of PAVE across a diverse set of merging methods, tasks, and model architectures.
翻译:模型融合旨在将经过单独微调的模型中的任务特定能力整合到单一模型中,而无需额外训练。在近期的模型融合方法中,任务向量已成为基本构建模块,因为它能够封装来自微调的残差信息。然而,由于任务向量中任务无关冗余引起的冲突,融合后的模型常常遭受显著的性能下降。现有通过随机丢弃参数空间元素来克服冗余的方法涉及随机性且缺乏知识感知。为应对这些挑战,本研究提出在知识感知子空间中进行任务向量净化(PAVE)。具体而言,我们从每个任务中采样一些训练样本,并将其输入到对应的微调模型中,以获取线性层之前的协方差矩阵。随后,我们执行一种面向上下文的奇异值分解,该分解强调与目标知识最相关的权重分量。因此,我们可以在知识感知子空间中将微调模型权重分解为任务相关分量和冗余分量,并通过剪除冗余分量来净化任务向量。为在模型间引入公平的剪枝力度,我们进一步提出一种谱秩分配策略,通过优化归一化的激活剪枝误差来实现。我们的方法作为一种即插即用方案,其任务向量净化可应用于各种基于任务向量的融合方法以提升其性能。在实验中,我们在多种融合方法、任务和模型架构上验证了PAVE的有效性。