A signature of human cognitive control is conflict adaptation: improved performance on a high-conflict trial following another high-conflict trial. This phenomenon offers an account for how cognitive control, a scarce resource, is recruited. Using a sequential Stroop task, we find that 12 of 13 vision-language models (VLMs) tested exhibit behavior consistent with conflict adaptation, with the lone exception likely reflecting a ceiling effect. To understand the representational basis of this behavior, we use sparse autoencoders (SAEs) to identify task-relevant supernodes in InternVL 3.5 4B. Partially overlapping supernodes emerge for text and color in both early and late layers, and their relative sizes mirror the automaticity asymmetry between reading and color naming in humans. We further isolate a conflict-modulated supernode in layers 24-25 whose ablation significantly increases Stroop errors while minimally affecting congruent trials.
翻译:人类认知控制的一个标志性特征是冲突适应:在经历一次高冲突试次后,对后续高冲突试次的处理性能会得到提升。这一现象为稀缺的认知控制资源如何被调用提供了理论解释。通过序列化Stroop任务实验,我们发现测试的13个视觉语言模型(VLMs)中有12个表现出与冲突适应一致的行为模式,唯一未显示的模型可能反映了天花板效应。为理解该行为的表征基础,我们采用稀疏自编码器(SAEs)在InternVL 3.5 4B模型中识别出任务相关的超节点。文本与颜色处理在早期和深层网络层中均出现部分重叠的超节点,其相对规模映射出人类阅读与颜色命名自动化程度的不对称性。我们进一步在24-25层分离出冲突调控超节点,其消融处理使Stroop错误率显著上升,而对一致试次的影响微乎其微。