LLM-based software engineering is influencing modern software development. In addition to correctness, prior studies have also examined the performance of software artifacts generated by AI agents. However, it is unclear how exactly the agentic AI systems address performance concerns in practice. In this paper, we present an empirical study of performance-related pull requests generated by AI agents. Using LLM-assisted detection and BERTopic-based topic modeling, we identified 52 performance-related topics grouped into 10 higher-level categories. Our results show that AI agents apply performance optimizations across diverse layers of the software stack and that the type of optimization significantly affects pull request acceptance rates and review times. We also found that performance optimization by AI agents primarily occurs during the development phase, with less focus on the maintenance phase. Our findings provide empirical evidence that can support the evaluation and improvement of agentic AI systems with respect to their performance optimization behaviors and review outcomes.
翻译:基于大语言模型的软件工程正在影响现代软件开发。除了正确性之外,先前的研究也考察了由AI智能体生成的软件制品的性能。然而,目前尚不清楚智能AI系统在实践中究竟如何应对性能问题。在本文中,我们对由AI智能体生成的与性能相关的拉取请求进行了一项实证研究。利用LLM辅助检测和基于BERTopic的主题建模,我们识别出52个性能相关主题,并将其归纳为10个更高层次的类别。我们的结果表明,AI智能体在软件栈的各个不同层面应用性能优化,并且优化类型显著影响拉取请求的接受率和审查时间。我们还发现,AI智能体进行的性能优化主要发生在开发阶段,对维护阶段的关注较少。我们的发现提供了实证证据,可以支持对智能AI系统在性能优化行为及审查结果方面进行评估和改进。