Despite advances in mathematical reasoning capabilities, Large Language Models (LLMs) still struggle with calculation verification when using established prompting techniques. We present MDToC (Metacognitive Dynamic Tree of Concepts), a three-phase approach that constructs a concept tree, develops accuracy-verified calculations for each concept, and employs majority voting to evaluate competing solutions. Evaluations across CHAMP, MATH, and Game-of-24 benchmarks demonstrate our MDToC's effectiveness, with GPT-4-Turbo achieving 58.1\% on CHAMP, 86.6\% on MATH, and 85\% on Game-of-24 - outperforming GoT by 5\%, 5.4\%, and 4\% on all these tasks, respectively, without hand-engineered hints. MDToC consistently surpasses existing prompting methods across all backbone models, yielding improvements of up to 7.6\% over ToT and 6.2\% over GoT, establishing metacognitive calculation verification as a promising direction for enhanced mathematical reasoning.
翻译:尽管大语言模型(LLMs)的数学推理能力已取得进展,但在使用现有提示技术时仍面临计算验证的困难。本文提出MDToC(元认知动态概念树),该方法通过三阶段流程构建概念树,为每个概念开发经准确性验证的计算过程,并采用多数投票机制评估竞争性解决方案。在CHAMP、MATH和Game-of-24基准测试中的评估表明,MDToC方法显著有效:GPT-4-Turbo在CHAMP上达到58.1%,在MATH上达到86.6%,在Game-of-24上达到85%的准确率,在所有任务中分别优于GoT方法5%、5.4%和4%,且无需人工设计的提示。MDToC在所有骨干模型上均持续超越现有提示方法,较ToT提升最高达7.6%,较GoT提升6.2%,这确立了元认知计算验证作为增强数学推理能力的有效研究方向。