反思置信度：通过在线自我修正纠正推理缺陷 (Reflective Confidence: Correcting Reasoning Flaws via Online Self-Correction)

Large language models (LLMs) have achieved strong performance on complex reasoning tasks using techniques such as chain-of-thought and self-consistency. However, ensemble-based approaches, especially self-consistency which relies on multiple reasoning trajectories, often incur substantial computational overhead. To improve efficiency, prior work has leveraged internal confidence signals, where early stopping strategies such as DeepConf reduce cost by terminating low-confidence trajectories. However, this strategy discards incomplete reasoning paths and wastes partial computation. We propose reflective confidence, a novel reasoning framework that transforms low-confidence signals from termination indicators into reflection triggers. When confidence falls below a threshold, instead of stopping generation, the model produces a reflection prompt to analyze the current reasoning state, identify potential errors, and continue generation along a corrected trajectory. Experiments on mathematical reasoning benchmarks, including AIME 2025, demonstrate significant accuracy improvements over advanced early-stopping baselines at comparable computational cost, validating the effectiveness of proactive self-correction over passive discarding.

翻译：大型语言模型（LLM）通过使用思维链和自洽性等技术，在复杂推理任务上取得了强劲的性能。然而，基于集成的方法，尤其是依赖多个推理轨迹的自洽性方法，通常会产生巨大的计算开销。为了提高效率，先前的研究利用了内部置信度信号，其中像DeepConf这样的早期停止策略通过终止低置信度轨迹来降低成本。然而，这种策略会丢弃不完整的推理路径并浪费部分计算。我们提出了反思置信度，这是一种新颖的推理框架，它将来自终止指示器的低置信度信号转化为反思触发器。当置信度低于阈值时，模型不是停止生成，而是产生一个反思提示来分析当前的推理状态，识别潜在错误，并沿着修正后的轨迹继续生成。在包括AIME 2025在内的数学推理基准测试上的实验表明，在可比较的计算成本下，相较于先进的早期停止基线方法，该方法实现了显著的准确性提升，验证了主动自我修正相对于被动丢弃的有效性。