MDToC：基于元认知动态概念树提升大语言模型数学问题求解能力 (MDToC: Metacognitive Dynamic Tree of Concepts for Boosting Mathematical Problem-Solving of Large Language Models) - 专知论文

会员服务 ·

0

数学 · 问题求解 · 语言模型 · 数学推理 · 大语言模型 ·

MDToC: Metacognitive Dynamic Tree of Concepts for Boosting Mathematical Problem-Solving of Large Language Models

翻译：MDToC：基于元认知动态概念树提升大语言模型数学问题求解能力

Tung Duong Ta,Tim Oates

Despite advances in mathematical reasoning capabilities, Large Language Models (LLMs) still struggle with calculation verification when using established prompting techniques. We present MDToC (Metacognitive Dynamic Tree of Concepts), a three-phase approach that constructs a concept tree, develops accuracy-verified calculations for each concept, and employs majority voting to evaluate competing solutions. Evaluations across CHAMP, MATH, and Game-of-24 benchmarks demonstrate our MDToC's effectiveness, with GPT-4-Turbo achieving 58.1\% on CHAMP, 86.6\% on MATH, and 85\% on Game-of-24 - outperforming GoT by 5\%, 5.4\%, and 4\% on all these tasks, respectively, without hand-engineered hints. MDToC consistently surpasses existing prompting methods across all backbone models, yielding improvements of up to 7.6\% over ToT and 6.2\% over GoT, establishing metacognitive calculation verification as a promising direction for enhanced mathematical reasoning.

翻译：尽管大语言模型（LLMs）的数学推理能力已取得进展，但在使用现有提示技术时仍面临计算验证的困难。本文提出MDToC（元认知动态概念树），该方法通过三阶段流程构建概念树，为每个概念开发经准确性验证的计算过程，并采用多数投票机制评估竞争性解决方案。在CHAMP、MATH和Game-of-24基准测试中的评估表明，MDToC方法显著有效：GPT-4-Turbo在CHAMP上达到58.1%，在MATH上达到86.6%，在Game-of-24上达到85%的准确率，在所有任务中分别优于GoT方法5%、5.4%和4%，且无需人工设计的提示。MDToC在所有骨干模型上均持续超越现有提示方法，较ToT提升最高达7.6%，较GoT提升6.2%，这确立了元认知计算验证作为增强数学推理能力的有效研究方向。

0

相关内容

数学是关于数量、结构、变化等主题的探索。

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

23+阅读 · 2023年5月10日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

【SIGIR2020-中科院计算所】L2R2: 利用排名进行外展推理，L2R2: Leveraging Ranking for Abductive Reasoning

【SIGIR2020-中科院计算所】L2R2: 利用排名进行外展推理，L2R2: Leveraging Ranking for Abductive Reasoning

专知会员服务

11+阅读 · 2020年5月25日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

模糊认知集群优化的聚类算法

国家自然科学基金

8+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

MMSRARec: Summarization and Retrieval Augumented Sequential Recommendation Based on Multimodal Large Language Model

Arxiv

0+阅读 · 12月24日

ART: Adaptive Response Tuning Framework -- A Multi-Agent Tournament-Based Approach to LLM Response Optimization

Arxiv

0+阅读 · 12月24日

Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

Arxiv

0+阅读 · 12月23日

CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

Arxiv

0+阅读 · 12月21日

RBCTest: Leveraging LLMs to Mine and Verify Oracles of API Response Bodies for RESTful API Testing

Arxiv

0+阅读 · 12月19日

VIP会员

文章信息

相关主题

大语言模型

相关VIP内容

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

23+阅读 · 2023年5月10日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

【SIGIR2020-中科院计算所】L2R2: 利用排名进行外展推理，L2R2: Leveraging Ranking for Abductive Reasoning

【SIGIR2020-中科院计算所】L2R2: 利用排名进行外展推理，L2R2: Leveraging Ranking for Abductive Reasoning

专知会员服务

11+阅读 · 2020年5月25日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

热门VIP内容

开通专知VIP会员享更多权益服务

《用于水文建模应用的美国空军全球空陆天气开发模型数据流程：GALWEM采集系统v1.0与v2.0概述》最新报告

《MERLIN：面向推广资源与研究的国家数据管理平台》报告

人工智能与未来指挥

《未来自主协作系统的指挥与控制——2025年度报告》报告

相关资讯

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

相关论文

MMSRARec: Summarization and Retrieval Augumented Sequential Recommendation Based on Multimodal Large Language Model

Arxiv

0+阅读 · 12月24日

ART: Adaptive Response Tuning Framework -- A Multi-Agent Tournament-Based Approach to LLM Response Optimization

Arxiv

0+阅读 · 12月24日

Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

Arxiv

0+阅读 · 12月23日

CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

Arxiv

0+阅读 · 12月21日

RBCTest: Leveraging LLMs to Mine and Verify Oracles of API Response Bodies for RESTful API Testing

Arxiv

0+阅读 · 12月19日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

模糊认知集群优化的聚类算法

国家自然科学基金

8+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员