Self-evolving memory systems are unprecedentedly reshaping the evolutionary paradigm of large language model (LLM)-based agents. Prior work has predominantly relied on manually engineered memory architectures to store trajectories, distill experience, and synthesize reusable tools, enabling agents to evolve on the fly within environment interactions. However, this paradigm is fundamentally constrained by the staticity of the memory system itself: while memory facilitates agent-level evolving, the underlying memory architecture cannot be meta-adapted to diverse task contexts. To address this gap, we propose MemEvolve, a meta-evolutionary framework that jointly evolves agents' experiential knowledge and their memory architecture, allowing agent systems not only to accumulate experience but also to progressively refine how they learn from it. To ground MemEvolve in prior research and foster openness in future self-evolving systems, we introduce EvolveLab, a unified self-evolving memory codebase that distills twelve representative memory systems into a modular design space (encode, store, retrieve, manage), providing both a standardized implementation substrate and a fair experimental arena. Extensive evaluations on four challenging agentic benchmarks demonstrate that MemEvolve achieves (I) substantial performance gains, improving frameworks such as SmolAgent and Flash-Searcher by up to $17.06\%$; and (II) strong cross-task and cross-LLM generalization, designing memory architectures that transfer effectively across diverse benchmarks and backbone models.
翻译:自进化记忆系统正在以前所未有的方式重塑基于大语言模型(LLM)的智能体的进化范式。先前的研究主要依赖于人工设计的记忆架构来存储轨迹、提炼经验并合成可重用工具,使得智能体能够在环境交互中实时进化。然而,该范式从根本上受到记忆系统自身静态性的限制:虽然记忆促进了智能体层面的进化,但底层的记忆架构无法根据多样化的任务情境进行元适应。为弥补这一不足,我们提出了MemEvolve,一个元进化框架,能够联合进化智能体的经验知识及其记忆架构,使得智能体系统不仅能积累经验,还能逐步优化其从经验中学习的方式。为了将MemEvolve建立在先前研究基础上并促进未来自进化系统的开放性,我们引入了EvolveLab,一个统一的自进化记忆代码库。它将十二种代表性记忆系统提炼为一个模块化的设计空间(编码、存储、检索、管理),既提供了标准化的实现基础,也构建了一个公平的实验平台。在四个具有挑战性的智能体基准测试上的广泛评估表明,MemEvolve实现了:(I)显著的性能提升,将SmolAgent和Flash-Searcher等框架的性能提高了高达$17.06\%$;(II)强大的跨任务与跨LLM泛化能力,能够设计出在不同基准测试和骨干模型间有效迁移的记忆架构。