Large language models (LLMs) have behaved well in function-level code translation without repository-level context. However, the performance of LLMs in repository-level context code translation remains suboptimal due to complex dependencies and context, hindering their adoption in industrial settings. In this work, we propose a novel LLM-based code translation technique K-Trans, which leverages triple knowledge augmentation to enhance LLM's translation quality under repository context in real-world software development. First, K-Trans constructs a evolving translation knowledge base by extracting relevant information from target-language codebases, the repository being translated, and prior translation results. Second, for each function to be translated, K-Trans retrieves relevant triple knowledge, including target-language code samples, dependency usage examples, and successful translation function pairs, serving as references to enhance LLM for translation. Third, K-Trans constructs a knowledge-augmented translation prompt using the retrieved triple knowledge and employs LLMs to generate the translated code while preserving repository context. It further leverages LLMs for self-debugging, enhancing translation correctness. Lastly, K-Trans continuously evolves the translation knowledge base. The experiments show that K-Trans substantially outperforms the baseline adapted from previous work by 19.4%/40.2% relative improvement in pass@1 and 0.138 in CodeBLEU. It is important to note that the results also demonstrate that each knowledge significantly contributes to K-Trans's effectiveness in handling repository-level context code translation, with dependency usage examples making the most notable contribution. Moreover, as the self-evolution process progresses, the knowledge base continuously enhances the LLM's performance across various aspects of the repository-level code translation.
翻译:大语言模型(LLMs)在无仓库级上下文的函数级代码翻译任务中表现良好。然而,由于复杂的依赖关系和上下文环境,LLMs在仓库级上下文代码翻译中的性能仍不理想,阻碍了其在工业场景中的应用。本研究提出一种新颖的基于LLM的代码翻译技术K-Trans,该技术利用三重知识增强机制,以提升LLM在真实软件开发仓库上下文下的翻译质量。首先,K-Trans通过从目标语言代码库、待翻译仓库及历史翻译结果中提取相关信息,构建一个持续进化的翻译知识库。其次,针对每个待翻译函数,K-Trans检索相关的三重知识,包括目标语言代码示例、依赖使用范例和成功翻译的函数对,作为增强LLM翻译的参考依据。第三,K-Trans利用检索到的三重知识构建知识增强的翻译提示,并运用LLM生成保持仓库上下文的翻译代码,同时通过LLM自调试机制进一步提升翻译正确性。最后,K-Trans持续进化翻译知识库。实验表明,K-Trans在pass@1指标上相对基线方法(基于先前工作适配)分别提升19.4%/40.2%,CodeBLEU得分提升0.138。需要特别指出的是,实验结果同时证明各类知识对K-Trans处理仓库级上下文代码翻译均具有显著贡献,其中依赖使用范例的贡献最为突出。此外,随着自我进化过程的推进,知识库持续增强LLM在仓库级代码翻译各维度的性能表现。