Automatically synthesizing verifiable code from natural language requirements ensures software correctness and reliability while significantly lowering the barrier to adopting the techniques of formal methods. With the rise of large language models (LLMs), long-standing efforts at autoformalization have gained new momentum. However, existing approaches suffer from severe syntactic and semantic errors due to the scarcity of domain-specific pre-training corpora and often fail to formalize implicit knowledge effectively. In this paper, we propose AutoICE, an LLM-driven evolutionary search for synthesizing verifiable C code. It introduces the diverse individual initialization and the collaborative crossover to enable diverse iterative updates, thereby mitigating error propagation inherent in single-agent iterations. Besides, it employs the self-reflective mutation to facilitate the discovery of implicit knowledge. Evaluation results demonstrate the effectiveness of AutoICE: it successfully verifies $90.36$\% of code, outperforming the state-of-the-art (SOTA) approach. Besides, on a developer-friendly dataset variant, AutoICE achieves a $88.33$\% verification success rate, significantly surpassing the $65$\% success rate of the SOTA approach.
翻译:从自然语言需求自动合成可验证的代码,能够确保软件的正确性和可靠性,同时显著降低采用形式化方法技术的门槛。随着大语言模型(LLMs)的兴起,长期以来的自动形式化努力获得了新的动力。然而,由于领域特定预训练语料的稀缺,现有方法存在严重的语法和语义错误,且往往无法有效形式化隐含知识。本文提出AutoICE,一种基于大语言模型驱动的进化搜索方法,用于合成可验证的C代码。它引入了多样化个体初始化和协作交叉操作,以实现多样化的迭代更新,从而缓解单智能体迭代中固有的错误传播问题。此外,该方法采用自反思变异机制,以促进隐含知识的发现。评估结果证明了AutoICE的有效性:它成功验证了$90.36$\%的代码,优于当前最先进(SOTA)方法。此外,在一个开发者友好的数据集变体上,AutoICE实现了$88.33$\%的验证成功率,显著超越了SOTA方法$65$\%的成功率。