Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning, yet their learning ability, which is crucial for adapting to dynamic environments and acquiring new knowledge, remains underexplored. In this work, we address this gap by introducing a framework inspired by cognitive psychology and education. Specifically, we decompose general learning ability into three distinct, complementary dimensions: Learning from Instructor (acquiring knowledge via explicit guidance), Learning from Concept (internalizing abstract structures and generalizing to new contexts), and Learning from Experience (adapting through accumulated exploration and feedback). We conduct a comprehensive empirical study across the three learning dimensions and identify several insightful findings, such as (i) interaction improves learning; (ii) conceptual understanding is scale-emergent and benefits larger models; and (iii) LLMs are effective few-shot learners but not many-shot learners. Based on our framework and empirical findings, we introduce a benchmark that provides a unified and realistic evaluation of LLMs' general learning abilities across three learning cognition dimensions. It enables diagnostic insights and supports evaluation and development of more adaptive and human-like models.
翻译:大型语言模型(LLM)在数学、编程和推理等任务上展现出令人印象深刻的能力,然而其学习能力——对于适应动态环境和获取新知识至关重要——仍未得到充分探索。本研究通过引入一个受认知心理学与教育学启发的框架来填补这一空白。具体而言,我们将通用学习能力分解为三个相互独立且互补的维度:从指导者学习(通过显式指导获取知识)、从概念学习(内化抽象结构并泛化至新情境)以及从经验学习(通过积累探索与反馈进行适应)。我们围绕这三个学习维度开展了全面的实证研究,并发现了若干富有洞见的结论,例如:(i)交互促进学习;(ii)概念理解具有规模涌现性,更大模型受益更多;(iii)LLM 是有效的少样本学习者,而非多样本学习者。基于我们的框架与实证发现,我们提出了一个基准测试,用于在三个学习认知维度上对 LLM 的通用学习能力进行统一且贴近现实的评估。该基准能够提供诊断性见解,并支持评估和开发更具适应性、更类人的模型。