基于大语言模型的代码重构：少样本设置下的全面评估 (Code Refactoring with LLM: A Comprehensive Evaluation With Few-Shot Settings)

In today's world, the focus of programmers has shifted from writing complex, error-prone code to prioritizing simple, clear, efficient, and sustainable code that makes programs easier to understand. Code refactoring plays a critical role in this transition by improving structural organization and optimizing performance. However, existing refactoring methods are limited in their ability to generalize across multiple programming languages and coding styles, as they often rely on manually crafted transformation rules. The objectives of this study are to (i) develop an Large Language Models (LLMs)-based framework capable of performing accurate and efficient code refactoring across multiple languages (C, C++, C#, Python, Java), (ii) investigate the impact of prompt engineering (Temperature, Different shot algorithm) and instruction fine-tuning on refactoring effectiveness, and (iii) evaluate the quality improvements (Compilability, Correctness, Distance, Similarity, Number of Lines, Token, Character, Cyclomatic Complexity) in refactored code through empirical metrics and human assessment. To accomplish these goals, we propose a fine-tuned prompt-engineering-based model combined with few-shot learning for multilingual code refactoring. Experimental results indicate that Java achieves the highest overall correctness up to 99.99% the 10-shot setting, records the highest average compilability of 94.78% compared to the original source code and maintains high similarity (Approx. 53-54%) and thus demonstrates a strong balance between structural modifications and semantic preservation. Python exhibits the lowest structural distance across all shots (Approx. 277-294) while achieving moderate similarity ( Approx. 44-48%) that indicates consistent and minimally disruptive refactoring.

翻译：在当今世界，程序员的关注点已从编写复杂且易出错的代码，转向优先考虑简单、清晰、高效且可持续的代码，以使程序更易于理解。代码重构通过改善结构组织和优化性能，在这一转变中发挥着关键作用。然而，现有的重构方法因其通常依赖于手动制定的转换规则，在跨多种编程语言和编码风格的泛化能力上存在局限。本研究的目标是：(i) 开发一个基于大语言模型（LLMs）的框架，能够跨多种语言（C、C++、C#、Python、Java）执行准确且高效的代码重构；(ii) 研究提示工程（温度参数、不同样本算法）和指令微调对重构效果的影响；(iii) 通过实证指标和人工评估，评估重构后代码在质量上的改进（可编译性、正确性、距离、相似性、代码行数、令牌数、字符数、圈复杂度）。为实现这些目标，我们提出了一种结合少样本学习的、基于微调提示工程的模型，用于多语言代码重构。实验结果表明，在10样本设置下，Java实现了高达99.99%的最高总体正确率，相比原始源代码获得了最高的平均可编译性（94.78%），并保持了较高的相似性（约53-54%），从而在结构修改与语义保留之间展现出良好的平衡。Python在所有样本设置下均表现出最低的结构距离（约277-294），同时实现了中等相似性（约44-48%），这表明其重构过程一致且破坏性最小。