Supervised Fine-Tuning (SFT) is used to specialize model behavior by training weights to produce intended target responses for queries. In contrast, In-Context Learning (ICL) adapts models during inference with instructions or demonstrations in the prompt. ICL can offer better generalizability and more calibrated responses compared to SFT in data scarce settings, at the cost of more inference compute. In this work, we ask the question: Can ICL's internal computations be used to improve the qualities of SFT? We first show that ICL and SFT produce distinct activation patterns, indicating that the two methods achieve adaptation through different functional mechanisms. Motivated by this observation and to use ICL's rich functionality, we introduce ICL Activation Alignment (IA2), a self-distillation technique which aims to replicate ICL's activation patterns in SFT models and incentivizes ICL-like internal reasoning. Performing IA2 as a priming step before SFT significantly improves the accuracy and calibration of model outputs, as shown by our extensive empirical results on 12 popular benchmarks and two model families. This finding is not only practically useful, but also offers a conceptual window into the inner mechanics of model adaptation.
翻译:监督微调(SFT)通过训练模型权重以针对查询生成预期的目标响应,从而专门化模型行为。相比之下,上下文学习(ICL)在推理过程中通过提示中的指令或示例来调整模型。在数据稀缺场景下,ICL相较于SFT能够提供更好的泛化能力和更校准的响应,但需要更高的推理计算成本。本研究探讨以下问题:能否利用ICL的内部计算机制来提升SFT的质量?我们首先证明ICL与SFT会产生不同的激活模式,表明这两种方法通过不同的功能机制实现适应。基于这一观察,并为了利用ICL的丰富功能,我们提出了ICL激活对齐(IA2)——一种自蒸馏技术,旨在使SFT模型复现ICL的激活模式,并激励类似ICL的内部推理过程。实验结果表明,在SFT之前执行IA2作为预热步骤,能显著提升模型输出的准确性和校准度。我们在12个主流基准测试和两类模型家族上进行了广泛实证验证。这一发现不仅具有实际应用价值,也为理解模型适应的内部机制提供了概念性窗口。