基于大语言模型的自动化程序修复综述：分类体系、设计范式与应用 (A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications)

Large language models (LLMs) are reshaping automated program repair. We present a unified taxonomy that groups 62 recent LLM-based repair systems into four paradigms defined by parameter adaptation and control authority over the repair loop, and overlays two cross-cutting layers for retrieval and analysis augmentation. Prior surveys have either focused on classical software repair techniques, on LLMs in software engineering more broadly, or on subsets of LLM-based software repair, such as fine-tuning strategies or vulnerability repair. We complement these works by treating fine-tuning, prompting, procedural pipelines, and agentic frameworks as first-class paradigms and systematically mapping representative systems to each of these paradigms. We also consolidate evaluation practice on common benchmarks by recording benchmark scope, pass@k, and fault-localization assumptions to support a more meaningful comparison of reported success rates. We clarify trade-offs among paradigms in task alignment, deployment cost, controllability, and ability to repair multi-hunk or cross-file bugs. We discuss challenges in current LLM-based software repair and outline research directions. Our artifacts, including the representation papers and scripted survey pipeline, are publicly available at https://github.com/GLEAM-Lab/ProgramRepair.

翻译：大语言模型（LLMs）正在重塑自动化程序修复领域。本文提出一个统一的分类体系，将62个近期基于LLM的修复系统归纳为四种范式，这些范式由参数适配方式和修复循环控制权限定义，并叠加了检索与分析增强两个交叉层。现有综述或聚焦于经典软件修复技术，或广泛讨论LLM在软件工程中的应用，或仅涵盖基于LLM的软件修复子集（如微调策略或漏洞修复）。本文通过将微调、提示工程、流程化管道和智能体框架作为一级范式，并将代表性系统系统化映射至各范式，对现有研究形成补充。我们通过记录基准测试范围、pass@k指标及错误定位假设，整合了通用基准上的评估实践，以支持对报告修复成功率进行更有意义的比较。本文阐明了各范式在任务对齐、部署成本、可控性以及修复多块（multi-hunk）或跨文件缺陷能力方面的权衡，讨论了当前基于LLM的软件修复面临的挑战，并展望了研究方向。相关资源（包括代表性论文和可复现的综述流程脚本）已在https://github.com/GLEAM-Lab/ProgramRepair公开。