Retrieval-augmented generation (RAG) incorporates external knowledge into large language models (LLMs), improving their adaptability to downstream tasks and enabling information updates. Surprisingly, recent empirical evidence demonstrates that injecting noise into retrieved relevant documents paradoxically facilitates exploitation of external knowledge and improves generation quality. Although counterintuitive and challenging to apply in practice, this phenomenon enables granular control and rigorous analysis of how LLMs integrate external knowledge. Therefore, in this paper, we intervene on noise injection and establish a layer-specific functional demarcation within the LLM: shallow layers specialize in local context modeling, intermediate layers focus on integrating long-range external factual knowledge, and deeper layers primarily rely on parametric internal knowledge. Building on this insight, we propose Layer Fused Decoding (LFD), a simple decoding strategy that directly combines representations from an intermediate layer with final-layer decoding outputs to fully exploit the external factual knowledge. To identify the optimal intermediate layer, we introduce an internal knowledge score (IKS) criterion that selects the layer with the lowest IKS value in the latter half of layers. Experimental results across multiple benchmarks demonstrate that LFD helps RAG systems more effectively surface retrieved context knowledge with minimal cost.
翻译:检索增强生成(RAG)将外部知识整合到大型语言模型(LLMs)中,提升了其对下游任务的适应性并支持信息更新。令人惊讶的是,近期的实证研究表明,向检索到的相关文档中注入噪声反而有助于利用外部知识并提高生成质量。尽管这一现象有悖直觉且在实际应用中具有挑战性,但它使得我们能够对LLMs如何整合外部知识进行细粒度控制和严谨分析。因此,本文通过对噪声注入进行干预,在LLM内部建立了层级特异的功能划分:浅层专注于局部上下文建模,中间层聚焦于整合长程外部事实知识,而深层则主要依赖参数化的内部知识。基于此发现,我们提出了层融合解码(LFD),这是一种简单的解码策略,通过将中间层的表征与最终层的解码输出直接结合,以充分挖掘外部事实知识。为确定最优中间层,我们引入了内部知识评分(IKS)准则,该准则选择后半部分层中IKS值最低的层。在多个基准测试上的实验结果表明,LFD能够以最小成本帮助RAG系统更有效地呈现检索到的上下文知识。