Training capable Large Language Model (LLM) agents is critically bottlenecked by the high cost and static nature of real-world interaction data. We address this by introducing GenEnv, a framework that establishes a difficulty-aligned co-evolutionary game between an agent and a scalable, generative environment simulator. Unlike traditional methods that evolve models on static datasets, GenEnv instantiates a dataevolving: the simulator acts as a dynamic curriculum policy, continuously generating tasks specifically tailored to the agent's ``zone of proximal development''. This process is guided by a simple but effective $α$-Curriculum Reward, which aligns task difficulty with the agent's current capabilities. We evaluate GenEnv on five benchmarks, including API-Bank, ALFWorld, BFCL, Bamboogle, and TravelPlanner. Across these tasks, GenEnv improves agent performance by up to \textbf{+40.3\%} over 7B baselines and matches or exceeds the average performance of larger models. Compared to Gemini 2.5 Pro-based offline data augmentation, GenEnv achieves better performance while using 3.3$\times$ less data. By shifting from static supervision to adaptive simulation, GenEnv provides a data-efficient pathway for scaling agent capabilities.
翻译:训练能力强的大语言模型(LLM)智能体,其关键瓶颈在于真实世界交互数据的高成本与静态特性。为解决此问题,我们提出了GenEnv框架,该框架在智能体与一个可扩展的生成式环境模拟器之间建立了一种难度对齐的协同进化博弈。与传统方法在静态数据集上进化模型不同,GenEnv实现了一种数据进化机制:模拟器充当动态课程策略,持续生成专门针对智能体“最近发展区”的任务。此过程由一个简单而有效的 $α$-课程奖励引导,该奖励将任务难度与智能体当前能力对齐。我们在五个基准测试上评估了GenEnv,包括API-Bank、ALFWorld、BFCL、Bamboogle和TravelPlanner。在这些任务中,GenEnv将智能体性能较7B基线模型最高提升了 \textbf{+40.3\%},并达到或超越了更大模型的平均性能。与基于Gemini 2.5 Pro的离线数据增强方法相比,GenEnv在仅使用其 \textbf{1/3.3} 数据量的情况下,实现了更优的性能。通过从静态监督转向自适应模拟,GenEnv为扩展智能体能力提供了一条数据高效的路径。