Large language model (LLM) agents have demonstrated strong capabilities across diverse domains, yet automated agent design remains a significant challenge. Current automated agent design approaches are often constrained by limited search spaces that primarily optimize workflows but fail to integrate crucial human-designed components like memory, planning, and tool use. Furthermore, these methods are hampered by high evaluation costs, as evaluating even a single new agent on a benchmark can require tens of dollars. The difficulty of this exploration is further exacerbated by inefficient search strategies that struggle to navigate the large design space effectively, making the discovery of novel agents a slow and resource-intensive process. To address these challenges, we propose AgentSwift, a novel framework for automated agent design. We formalize a hierarchical search space that jointly models agentic workflow and composable functional components. This structure moves beyond optimizing workflows alone by co-optimizing functional components, which enables the discovery of more complex and effective agent architectures. To make exploration within this expansive space feasible, we mitigate high evaluation costs by training a value model on a high-quality dataset, generated via a novel strategy combining combinatorial coverage and balanced Bayesian sampling for low-cost evaluation. Guiding the entire process is a hierarchical MCTS strategy, which is informed by uncertainty to efficiently navigate the search space. Evaluated across a comprehensive set of seven benchmarks spanning embodied, math, web, tool, and game domains, AgentSwift discovers agents that achieve an average performance gain of 8.34\% over both existing automated agent search methods and manually designed agents. Our framework serves as a launchpad for researchers to rapidly discover powerful agent architectures.
翻译:大语言模型(LLM)智能体已在多个领域展现出强大能力,但自动化智能体设计仍面临重大挑战。现有自动化设计方法通常受限于有限的搜索空间,主要优化工作流程,却未能整合记忆、规划与工具使用等关键的人工设计组件。此外,这些方法因高昂的评估成本而受阻——即使在基准测试中评估单个新智能体也可能需要数十美元。低效的搜索策略进一步加剧了探索难度,难以在庞大的设计空间中进行有效导航,导致新智能体的发现过程缓慢且资源密集。为应对这些挑战,我们提出AgentSwift——一种自动化智能体设计的新框架。我们形式化了一个分层搜索空间,可联合建模智能体工作流程与可组合功能组件。该结构超越单纯的工作流程优化,通过协同优化功能组件,实现了更复杂、更有效的智能体架构发现。为使这一广阔空间的探索可行,我们通过在高质量数据集上训练价值模型来降低高评估成本,该数据集采用结合组合覆盖与平衡贝叶斯采样的新颖策略生成,可实现低成本评估。整个流程由分层蒙特卡洛树搜索策略引导,该策略通过不确定性感知高效导航搜索空间。在涵盖具身智能、数学、网络、工具与游戏领域的七项综合基准测试中,AgentSwift发现的智能体相较于现有自动化搜索方法及人工设计智能体,平均性能提升达8.34%。本框架为研究者快速发现强大智能体架构提供了重要起点。