AutoMLGen：面向代码智能体的细粒度优化导航 (AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents)

Large language models (LLMs) have shown impressive performance in general programming tasks. However, in Machine Learning Engineering (MLE) scenarios such as AutoML and Kaggle competitions, achieving high performance depends heavily on expert intervention and repeated adjustments rather than simply generating correct code. When applied directly to these tasks, LLMs often lack fine-grained domain priors, and existing MLE approaches that use linear or tree-structured searches limit knowledge transfer to adjacent hierarchical links. As a result, they cannot leverage past full trajectories or share information across branches, limiting self-evolving ability and search space diversity. To address these limitations, we introduce AutoMLGen, an LLM-based coding agent that integrates a domain knowledge base for high-quality prior guidance and Monte Carlo Graph Search (MCGS) for efficient exploration. MCGS retains the tree-guided exploration of MCTS while embedding a graph structure into the expansion stage to enable dynamic path reorganization, historical trajectory reuse, and multi-solution fusion to support both self-evolution and collaborative learning. Combined with fine-grained operator sets, this design improves stability and accelerates convergence. Evaluation on the MLE-Bench shows that AutoMLGen achieves state-of-the-art performance in numerous dimensions, such as the average medal rate and the valid submission rate, under a 12-hour budget (half the standard runtime). The code is available at https://github.com/Alpha-Innovator/InternAgent.

翻译：大型语言模型（LLM）在通用编程任务中展现出卓越性能。然而，在自动机器学习（AutoML）和Kaggle竞赛等机器学习工程（MLE）场景中，实现高性能更依赖于专家干预与反复调整，而非单纯生成正确代码。当直接应用于此类任务时，LLM通常缺乏细粒度的领域先验知识，而现有采用线性或树状搜索的MLE方法仅能将知识转移限制在相邻层级连接中。因此，这些方法无法利用完整的历史轨迹或在分支间共享信息，从而限制了自我进化能力与搜索空间多样性。为应对这些局限，本文提出AutoMLGen——一种基于LLM的代码智能体，其整合了用于高质量先验引导的领域知识库以及支持高效探索的蒙特卡洛图搜索（MCGS）。MCGS在保留蒙特卡洛树搜索（MCTS）树状引导探索的同时，将图结构嵌入扩展阶段，以实现动态路径重组、历史轨迹复用与多解融合，从而同时支持自我进化与协作学习。结合细粒度算子集的设计，该框架提升了稳定性并加速了收敛。在MLE-Bench上的评估表明，在12小时计算预算（标准运行时间的一半）下，AutoMLGen在平均奖牌率、有效提交率等多个维度上达到了最先进的性能。代码已开源：https://github.com/Alpha-Innovator/InternAgent。