Understanding how the human brain progresses from processing simple linguistic inputs to performing high-level reasoning is a fundamental challenge in neuroscience. While modern large language models (LLMs) are increasingly used to model neural responses to language, their internal representations are highly "entangled," mixing information about lexicon, syntax, meaning, and reasoning. This entanglement biases conventional brain encoding analyses toward linguistically shallow features (e.g., lexicon and syntax), making it difficult to isolate the neural substrates of cognitively deeper processes. Here, we introduce a residual disentanglement method that computationally isolates these components. By first probing an LM to identify feature-specific layers, our method iteratively regresses out lower-level representations to produce four nearly orthogonal embeddings for lexicon, syntax, meaning, and, critically, reasoning. We used these disentangled embeddings to model intracranial (ECoG) brain recordings from neurosurgical patients listening to natural speech. We show that: 1) This isolated reasoning embedding exhibits unique predictive power, accounting for variance in neural activity not explained by other linguistic features and even extending to the recruitment of visual regions beyond classical language areas. 2) The neural signature for reasoning is temporally distinct, peaking later (~350-400ms) than signals related to lexicon, syntax, and meaning, consistent with its position atop a processing hierarchy. 3) Standard, non-disentangled LLM embeddings can be misleading, as their predictive success is primarily attributable to linguistically shallow features, masking the more subtle contributions of deeper cognitive processing.
翻译:理解人脑如何从处理简单语言输入发展到执行高级推理,是神经科学中的一个基础性挑战。尽管现代大型语言模型(LLMs)越来越多地被用于模拟对语言的神经响应,但其内部表征是高度“纠缠”的,混合了词汇、句法、语义和推理的信息。这种纠缠使传统的脑编码分析偏向于语言上的浅层特征(如词汇和句法),从而难以分离出认知上更深层过程的神经基础。在此,我们引入了一种残差解耦方法,通过计算来分离这些成分。该方法首先通过探测语言模型来识别特定于特征的层,然后迭代地回归掉较低层级的表征,从而为词汇、句法、语义以及关键的推理生成四个近乎正交的嵌入。我们使用这些解耦后的嵌入来建模神经外科患者在聆听自然言语时的颅内(ECoG)脑记录。结果表明:1)这种分离出的推理嵌入展现出独特的预测能力,能够解释其他语言特征无法解释的神经活动方差,甚至延伸到经典语言区之外的视觉脑区的激活。2)推理的神经信号在时间上是独特的,其峰值出现时间(约350-400毫秒)晚于与词汇、句法和语义相关的信号,这与它位于处理层级顶端的地位相符。3)标准的、未解耦的LLM嵌入可能具有误导性,因为其预测成功主要归因于语言上的浅层特征,从而掩盖了更深层认知处理的更微妙贡献。