上下文学习中的语义锚点：为何小型大语言模型无法翻转其标签 (Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels)

from arxiv, 13 pages total (7 pages main text, 3 pages references, 3 pages appendix), 2 figures, 14 tables. Code available at https://github.com/AnanthaPadmanaban-KrishnaKumar/semantic-anchors-icl

Can in-context learning (ICL) override pre-trained label semantics, or does it merely refine an existing semantic backbone? We address this question by treating LLMs as prompt-induced classifiers and contrasting their behavior under \emph{natural} demonstrations (with correct labels) and \emph{inverted} demonstrations (systematically flipping label meanings). We decompose ICL behavior into three alignment metrics (truth, prior, and prompt alignment) and introduce a semantic override rate, defined as correctness under flipped semantics. Across eight classification tasks and eight open-source LLMs (1--12B parameters), we find consistent evidence for a semantic anchor view. With natural demonstrations, ICL improves accuracy while maintaining strong prior alignment; most correct predictions coincide with zero-shot behavior, even when the prior is weak. With inverted demonstrations, models cannot learn coherent anti-semantic classifiers: prompt alignment increases only by sacrificing accuracy, and semantic override rates remain exactly zero in our few-shot 1--12B setting. Rather than flexibly remapping label meanings, ICL primarily adjusts how inputs project onto stable semantic directions learned during pre-training, clarifying fundamental limits of few-shot prompting and suggesting that overriding label semantics at these scales requires interventions beyond ICL. All code is available at: https://github.com/AnanthaPadmanaban-KrishnaKumar/semantic-anchors-icl.

翻译：上下文学习能否覆盖预训练获得的标签语义，还是仅能优化已有的语义骨架？我们将大语言模型视为提示诱导的分类器，通过对比其在自然演示（标签正确）与反转演示（系统性地翻转标签含义）下的行为来探讨这一问题。我们将上下文学习行为分解为三个对齐指标（真实性对齐、先验对齐和提示对齐），并引入语义覆盖率，定义为在翻转语义下的正确率。在八个分类任务和八个开源大语言模型（参数规模1-12B）中，我们发现了支持语义锚点观点的一致证据。在自然演示下，上下文学习提高了准确性，同时保持了较强的先验对齐；即使先验较弱，大多数正确预测仍与零样本行为一致。在反转演示下，模型无法学习到连贯的反语义分类器：提示对齐的提升以牺牲准确性为代价，且在我们的少样本1-12B设置中，语义覆盖率始终为零。上下文学习并非灵活地重映射标签含义，而主要是调整输入如何映射到预训练期间学到的稳定语义方向上。这阐明了少样本提示的基本限制，并表明在此规模上覆盖标签语义需要超越上下文学习的干预措施。所有代码已发布于：https://github.com/AnanthaPadmanaban-KrishnaKumar/semantic-anchors-icl。