面向仓库上下文代码翻译的进化式三重知识增强大语言模型 (Evolving Triple Knowledge-Augmented LLMs for Code Translation in Repository Context)

Large language models (LLMs) have behaved well in function-level code translation without repository-level context. However, the performance of LLMs in repository-level context code translation remains suboptimal due to complex dependencies and context, hindering their adoption in industrial settings. In this work, we propose a novel LLM-based code translation technique K-Trans, which leverages triple knowledge augmentation to enhance LLM's translation quality under repository context in real-world software development. First, K-Trans constructs a evolving translation knowledge base by extracting relevant information from target-language codebases, the repository being translated, and prior translation results. Second, for each function to be translated, K-Trans retrieves relevant triple knowledge, including target-language code samples, dependency usage examples, and successful translation function pairs, serving as references to enhance LLM for translation. Third, K-Trans constructs a knowledge-augmented translation prompt using the retrieved triple knowledge and employs LLMs to generate the translated code while preserving repository context. It further leverages LLMs for self-debugging, enhancing translation correctness. Lastly, K-Trans continuously evolves the translation knowledge base. The experiments show that K-Trans substantially outperforms the baseline adapted from previous work by 19.4%/40.2% relative improvement in pass@1 and 0.138 in CodeBLEU. It is important to note that the results also demonstrate that each knowledge significantly contributes to K-Trans's effectiveness in handling repository-level context code translation, with dependency usage examples making the most notable contribution. Moreover, as the self-evolution process progresses, the knowledge base continuously enhances the LLM's performance across various aspects of the repository-level code translation.

翻译：大语言模型（LLMs）在无仓库级上下文的函数级代码翻译任务中表现良好。然而，由于复杂的依赖关系和上下文环境，LLMs在仓库级上下文代码翻译中的性能仍不理想，阻碍了其在工业场景中的应用。本研究提出一种新颖的基于LLM的代码翻译技术K-Trans，该技术利用三重知识增强机制，以提升LLM在真实软件开发仓库上下文下的翻译质量。首先，K-Trans通过从目标语言代码库、待翻译仓库及历史翻译结果中提取相关信息，构建一个持续进化的翻译知识库。其次，针对每个待翻译函数，K-Trans检索相关的三重知识，包括目标语言代码示例、依赖使用范例和成功翻译的函数对，作为增强LLM翻译的参考依据。第三，K-Trans利用检索到的三重知识构建知识增强的翻译提示，并运用LLM生成保持仓库上下文的翻译代码，同时通过LLM自调试机制进一步提升翻译正确性。最后，K-Trans持续进化翻译知识库。实验表明，K-Trans在pass@1指标上相对基线方法（基于先前工作适配）分别提升19.4%/40.2%，CodeBLEU得分提升0.138。需要特别指出的是，实验结果同时证明各类知识对K-Trans处理仓库级上下文代码翻译均具有显著贡献，其中依赖使用范例的贡献最为突出。此外，随着自我进化过程的推进，知识库持续增强LLM在仓库级代码翻译各维度的性能表现。