提升自动形式化性能：基于直接依赖检索的方法 (Improving Autoformalization Using Direct Dependency Retrieval)

The convergence of deep learning and formal mathematics has spurred research in formal verification. Statement autoformalization, a crucial first step in this process, aims to translate informal descriptions into machine-verifiable representations but remains a significant challenge. The core difficulty lies in the fact that existing methods often suffer from a lack of contextual awareness, leading to hallucination of formal definitions and theorems. Furthermore, current retrieval-augmented approaches exhibit poor precision and recall for formal library dependency retrieval, and lack the scalability to effectively leverage ever-growing public datasets. To bridge this gap, we propose a novel retrieval-augmented framework based on DDR (\textit{Direct Dependency Retrieval}) for statement autoformalization. Our DDR method directly generates candidate library dependencies from natural language mathematical descriptions and subsequently verifies their existence within the formal library via an efficient suffix array check. Leveraging this efficient search mechanism, we constructed a dependency retrieval dataset of over 500,000 samples and fine-tuned a high-precision DDR model. Experimental results demonstrate that our DDR model significantly outperforms SOTA methods in both retrieval precision and recall. Consequently, an autoformalizer equipped with DDR shows consistent performance advantages in both single-attempt accuracy and multi-attempt stability compared to models using traditional selection-based RAG methods.

翻译：深度学习与形式化数学的融合推动了形式化验证领域的研究。陈述自动形式化作为该流程的关键初始步骤，旨在将非形式化描述转化为机器可验证的表示形式，但仍是当前面临的重要挑战。核心困难在于现有方法通常缺乏上下文感知能力，导致形式化定义与定理的生成出现幻觉现象。此外，当前基于检索增强的方法在形式化库依赖检索方面表现出较低的精确率与召回率，且缺乏有效利用持续增长的公共数据集的可扩展性。为弥补这一差距，我们提出了一种基于DDR（直接依赖检索）的新型检索增强框架用于陈述自动形式化。我们的DDR方法直接从自然语言数学描述生成候选库依赖，随后通过高效的后缀数组检查验证其在形式化库中的存在性。借助该高效检索机制，我们构建了包含超过50万个样本的依赖检索数据集，并微调出高精度的DDR模型。实验结果表明，我们的DDR模型在检索精确率与召回率方面均显著优于现有最优方法。因此，相较于采用传统基于选择的RAG方法的模型，配备DDR的自动形式化器在单次尝试准确率与多次尝试稳定性方面均展现出持续的性能优势。