基于仓库级代码翻译的自动化隔离验证方法研究 (Advancing Automated In-Isolation Validation in Repository-Level Code Translation)

Repository-level code translation aims to migrate entire repositories across programming languages while preserving functionality automatically. Despite advancements in repository-level code translation, validating the translations remains challenging. This paper proposes TRAM, which combines context-aware type resolution with mock-based in-isolation validation to achieve high-quality translations between programming languages. Prior to translation, TRAM retrieves API documentation and contextual code information for each variable type in the source language. It then prompts a large language model (LLM) with retrieved contextual information to resolve type mappings across languages with precise semantic interpretations. Using the automatically constructed type mapping, TRAM employs a custom serialization/deserialization workflow that automatically constructs equivalent mock objects in the target language. This enables each method fragment to be validated in isolation, without the high cost of using agents for translation validation, or the heavy manual effort required by existing approaches that rely on language interoperability. TRAM demonstrates state-of-the-art performance in Java-to-Python translation, underscoring the effectiveness of its integration of RAG-based type resolution with reliable in-isolation validation.

翻译：仓库级代码翻译旨在自动将整个代码仓库迁移至不同编程语言，同时保持功能不变。尽管仓库级代码翻译技术已取得进展，但翻译结果的验证仍具挑战性。本文提出TRAM方法，该方法结合上下文感知的类型解析与基于模拟的隔离验证，以实现编程语言间的高质量翻译。在翻译前，TRAM会检索源语言中每个变量类型的API文档及上下文代码信息，随后利用检索到的上下文信息提示大型语言模型（LLM），以精确的语义解释完成跨语言类型映射。基于自动构建的类型映射，TRAM采用定制化的序列化/反序列化工作流，在目标语言中自动构建等效的模拟对象。这使得每个方法片段均可在隔离状态下进行验证，既避免了使用智能体进行翻译验证的高昂成本，也无需依赖语言互操作性现有方法所需的大量人工操作。TRAM在Java到Python的翻译任务中展现出最先进的性能，充分证明了其基于RAG的类型解析与可靠隔离验证相结合的有效性。