Code generation, the task of creating executable programs from natural language requirements, has recently seen tremendous advances through Chain-of-Thought (CoT) reasoning, which enables Large Language Models (LLMs) to develop high-level reasoning plans before writing code. Recent research has proposed various methods to enhance models' CoT reasoning for code generation such as prompt engineering and supervised fine-tuning. However, existing approaches still face three critical limitations: (1) limited exploration of diverse reasoning paths, which constrains generalization across various programming scenarios, (2) lack of quality assessment for intermediate reasoning steps, which hampers the reliability of the generated plans and code, and (3) the potential negative impact of "overthinking", potentially leading to unnecessarily complex and incorrect solutions. To address these limitations, we frame CoT code generation as a decision making problem and present SEER, a SElf-Exploring deep Reasoning framework that enables accurate and adaptive reasoning for code generation. SEER introduces three key components: (1) Diverse reasoning path exploration, which aims at exploring diverse reasoning paths and annotating intermediate steps without relying on manual experts or closed-source proprietary models; (2) Reasoning quality-aware model training, which trains a policy model for generating candidate reasoning steps and a value model for assessing their quality; and (3) Adaptive CoT reasoning, which dynamically switches between direct generation and step-by-step reasoning for different problems.
翻译:代码生成任务旨在根据自然语言需求创建可执行程序,近期通过思维链推理取得了巨大进展,该技术使大语言模型能够在编写代码前制定高层次推理计划。现有研究提出了多种增强模型代码生成思维链推理的方法,如提示工程和监督微调。然而,当前方法仍面临三个关键局限:(1) 对多样化推理路径的探索有限,制约了跨编程场景的泛化能力;(2) 缺乏对中间推理步骤的质量评估,影响了生成计划与代码的可靠性;(3) "过度思考"可能产生的负面影响,易导致解决方案不必要的复杂化与错误。为解决这些局限,我们将思维链代码生成构建为决策问题,并提出SEER——一种自探索深度推理框架,能够实现精准且自适应的代码生成推理。SEER包含三个核心组件:(1) 多样化推理路径探索,旨在探索多样化推理路径并对中间步骤进行标注,无需依赖人工专家或闭源专有模型;(2) 推理质量感知模型训练,通过策略模型生成候选推理步骤,并利用价值模型评估其质量;(3) 自适应思维链推理,针对不同问题动态切换直接生成与逐步推理模式。