Large language models (LLMs) are increasingly being integrated into software development processes. The ability to generate code and submit pull requests with minimal human intervention, through the use of autonomous AI agents, is poised to become a standard practice. However, little is known about the practical usefulness of these pull requests and the extent to which their contributions are accepted in real-world projects. In this paper, we empirically study 567 GitHub pull requests (PRs) generated using Claude Code, an agentic coding tool, across 157 diverse open-source projects. Our analysis reveals that developers tend to rely on agents for tasks such as refactoring, documentation, and testing. The results indicate that 83.8% of these agent-assisted PRs are eventually accepted and merged by project maintainers, with 54.9% of the merged PRs are integrated without further modification. The remaining 45.1% require additional changes benefit from human revisions, especially for bug fixes, documentation, and adherence to project-specific standards. These findings suggest that while agent-assisted PRs are largely acceptable, they still benefit from human oversight and refinement.
翻译:大型语言模型正日益融入软件开发流程。通过自主AI代理生成代码并提交拉取请求,在最小化人工干预的情况下完成工作,有望成为标准实践。然而,目前对这些拉取请求的实际效用及其在真实项目中的接受程度仍知之甚少。本文通过实证方法研究了157个不同开源项目中由代理式编码工具Claude Code生成的567个GitHub拉取请求。分析表明,开发者倾向于借助代理完成重构、文档编写和测试等任务。结果显示,83.8%的代理辅助拉取请求最终被项目维护者接受并合并,其中54.9%的合并请求未经修改直接集成。其余45.1%的请求则需要人工进一步修改,尤其在错误修复、文档完善及遵循项目特定规范等方面受益于人工修订。这些发现表明,虽然代理辅助的拉取请求具有较高的可接受性,但仍需人类监督与优化才能发挥最大价值。