The integration of Large Language Models (LLMs) into software engineering has driven a transition from traditional rule-based systems to autonomous agentic systems capable of solving complex problems. However, systematic progress is hindered by a lack of comprehensive understanding of how benchmarks and solutions interconnect. This survey addresses this gap by providing the first holistic analysis of LLM-powered software engineering, offering insights into evaluation methodologies and solution paradigms. We review over 150 recent papers and propose a taxonomy along two key dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, including tasks such as code generation, translation, and repair. Our analysis highlights the evolution from simple prompt engineering to sophisticated agentic systems incorporating capabilities like planning, reasoning, memory mechanisms, and tool augmentation. To contextualize this progress, we present a unified pipeline illustrating the workflow from task specification to deliverables, detailing how different solution paradigms address various complexity levels. Unlike prior surveys that focus narrowly on specific aspects, this work connects 50+ benchmarks to their corresponding solution strategies, enabling researchers to identify optimal approaches for diverse evaluation criteria. We also identify critical research gaps and propose future directions, including multi-agent collaboration, self-evolving systems, and formal verification integration. This survey serves as a foundational guide for advancing LLM-driven software engineering. We maintain a GitHub repository that continuously updates the reviewed and related papers at https://github.com/lisaGuojl/LLM-Agent-SE-Survey.
翻译:将大型语言模型(LLMs)集成到软件工程中,推动了从传统基于规则的系统向能够解决复杂问题的自主智能体系统的转变。然而,由于缺乏对基准测试与解决方案之间相互关联的系统性理解,这一领域的系统性进展受到阻碍。本综述通过首次对LLM赋能的软件工程进行整体性分析,填补了这一空白,为评估方法和解决方案范式提供了见解。我们回顾了150多篇近期论文,并提出了一个基于两个关键维度的分类体系:(1)解决方案,分为基于提示、基于微调和基于智能体的范式;(2)基准测试,包括代码生成、翻译和修复等任务。我们的分析强调了从简单的提示工程到融合了规划、推理、记忆机制和工具增强等能力的复杂智能体系统的演进。为了阐明这一进展,我们提出了一个统一的流程框架,展示了从任务规约到交付成果的工作流程,详细说明了不同解决方案范式如何应对不同复杂度的任务。与先前仅聚焦于特定方面的综述不同,本研究将50多个基准测试与其对应的解决策略联系起来,使研究人员能够针对不同的评估标准确定最优方法。我们还指出了关键的研究空白,并提出了未来的研究方向,包括多智能体协作、自进化系统以及形式化验证的集成。本综述为推进LLM驱动的软件工程提供了基础性指南。我们在GitHub上维护了一个资源库,持续更新已综述及相关论文,地址为:https://github.com/lisaGuojl/LLM-Agent-SE-Survey。