ChatGPT何时需要“连续思考提示”？ (When do you need Chain-of-Thought Prompting for ChatGPT?) - 专知论文

会员服务 ·

0

CoT · ChatGPT · 训练数据 · 数据集 · 语言模型 ·

2023 年 4 月 6 日

When do you need Chain-of-Thought Prompting for ChatGPT?

翻译：ChatGPT何时需要“连续思考提示”？

Jiuhai Chen,Lichang Chen,Heng Huang,Tianyi Zhou

Chain-of-Thought (CoT) prompting can effectively elicit complex multi-step reasoning from Large Language Models~(LLMs). For example, by simply adding CoT instruction ``Let's think step-by-step'' to each input query of MultiArith dataset, GPT-3's accuracy can be improved from 17.7\% to 78.7\%. However, it is not clear whether CoT is still effective on more recent instruction finetuned (IFT) LLMs such as ChatGPT. Surprisingly, on ChatGPT, CoT is no longer effective for certain tasks such as arithmetic reasoning while still keeping effective on other reasoning tasks. Moreover, on the former tasks, ChatGPT usually achieves the best performance and can generate CoT even without being instructed to do so. Hence, it is plausible that ChatGPT has already been trained on these tasks with CoT and thus memorized the instruction so it implicitly follows such an instruction when applied to the same queries, even without CoT. Our analysis reflects a potential risk of overfitting/bias toward instructions introduced in IFT, which becomes more common in training LLMs. In addition, it indicates possible leakage of the pretraining recipe, e.g., one can verify whether a dataset and instruction were used in training ChatGPT. Our experiments report new baseline results of ChatGPT on a variety of reasoning tasks and shed novel insights into LLM's profiling, instruction memorization, and pretraining dataset leakage.

翻译：“连续思考提示”（Chain-of-Thought Prompting）可以有效地激发大型语言模型的复杂多步推理能力。例如，仅仅在MultiArith数据集的每个输入查询中添加CoT指令“让我们逐步思考”就可以将GPT-3的准确度从17.7%提高到78.7%。然而，目前还不清楚CoT对于新近指令微调（IFT）的语言模型，比如ChatGPT是否仍然有效。令人惊讶的是，对于某些任务如算术推理，ChatGPT上的CoT不再有效，而对于其他推理任务则仍然有效。此外，在前一种任务中，ChatGPT通常能够达到最佳性能，甚至在没有CoT指令的情况下也能够生成CoT。这表明，ChatGPT在训练时已经通过CoT指令进行了训练，并且在应用于相同查询时会自动遵循这样的指令，即使没有CoT指令。我们的分析反映了IFT中引入的指令存在过拟合/偏差的风险，这在训练LLM时变得越来越常见。另外，这也提示了预训练配方可能出现泄漏问题，例如我们可以验证ChatGPT的训练数据集和指令。我们实验报告了ChatGPT在各种推理任务上的新基准结果，并深入探讨了LLM的特性、指令的记忆化和预训练数据集泄漏。

1

相关内容

CoT

如何向ChatGPT问问题？这本手册《提问的艺术—让ChatGPT给出高质量答案》，提示工程技术全面指南，52页pdf

如何向ChatGPT问问题？这本手册《提问的艺术—让ChatGPT给出高质量答案》，提示工程技术全面指南，52页pdf

专知会员服务

195+阅读 · 2023年4月12日

关于大型语言模型需要知道的8件事

关于大型语言模型需要知道的8件事

专知会员服务

27+阅读 · 2023年4月3日

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

ChatGPT背后的大模型如何做推理？港中文等最新《自然语言推理》综述详述预训练语言模型推理方法

ChatGPT背后的大模型如何做推理？港中文等最新《自然语言推理》综述详述预训练语言模型推理方法

专知会员服务

116+阅读 · 2023年3月29日

语言模型如何做算法推理？Google Hattie Zhou《通过语境学习来教算法推理》，附Slides与论文

语言模型如何做算法推理？Google Hattie Zhou《通过语境学习来教算法推理》，附Slides与论文

专知会员服务

27+阅读 · 2023年3月10日

ChatGPT如何work的？最新《大型语言模型》综述，51页slides

ChatGPT如何work的？最新《大型语言模型》综述，51页slides

专知会员服务

162+阅读 · 2023年2月28日

现在大火的“In-context Learning”是什么？北大等最新《语境学习ICL》综述论文，详述ICL进展、挑战和方向

现在大火的“In-context Learning”是什么？北大等最新《语境学习ICL》综述论文，详述ICL进展、挑战和方向

专知会员服务

41+阅读 · 2023年1月3日

用蛋白语言模型改进蛋白复合物预测

用蛋白语言模型改进蛋白复合物预测

专知会员服务

10+阅读 · 2022年9月25日

【Contextual Embedding】什么时候上下文嵌入值得使用?

【Contextual Embedding】什么时候上下文嵌入值得使用?

专知会员服务

16+阅读 · 2020年8月2日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

揭秘ChatGPT情感对话能力

揭秘ChatGPT情感对话能力

专知

16+阅读 · 2023年4月9日

首次：微软用GPT-4做大模型指令微调，新任务零样本性能再提升

首次：微软用GPT-4做大模型指令微调，新任务零样本性能再提升

机器之心

7+阅读 · 2023年4月9日

关于AI大模型的一点思考和讨论

关于AI大模型的一点思考和讨论

极市平台

3+阅读 · 2022年11月7日

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

机器之心

1+阅读 · 2022年10月30日

DeepMind：为什么GPT能为你写诗？

DeepMind：为什么GPT能为你写诗？

新智元

1+阅读 · 2022年6月3日

赛尔原创@ACL 2022 | e-CARE: 可解释的因果推理数据集

赛尔原创@ACL 2022 | e-CARE: 可解释的因果推理数据集

哈工大SCIR

1+阅读 · 2022年5月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

LncRNA介导肿瘤相关巨噬细胞促进乳腺癌转移分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Chemerin通过调节p38MAPK通路参与动脉粥样硬化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

CXCR4与IL-35联合基因修饰间充质干细胞对溃疡性结肠炎局部免疫平衡的调节及清热燥湿凉血方的协同作用

国家自然科学基金

0+阅读 · 2014年12月31日

基于Wnt/beta-catenin信号通路研究西黄丸防治乳腺癌的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

车轮荷载作用下沥青路面孔隙水压力波动传导效应

国家自然科学基金

0+阅读 · 2013年12月31日

有关四阶Monge-Ampere型方程若干问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

听力基因prestin在回声定位哺乳动物中的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于4G-OFDM体制的GEO卫星移动通信系统星载交换关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

EAST面对等离子体钨瓦块缝隙结构对其燃料滞留和热负荷性能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

c-Myc及Cyclin A2诱导豚鼠耳蜗前体细胞增殖的实验研究

国家自然科学基金

0+阅读 · 2008年12月31日

Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective

Arxiv

0+阅读 · 2023年5月24日

Exploring Chain-of-Thought Style Prompting for Text-to-SQL

Arxiv

0+阅读 · 2023年5月23日

Active Prompting with Chain-of-Thought for Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Probing in Context: Toward Building Robust Classifiers via Probing Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Distilling ChatGPT for Explainable Automated Student Answer Assessment

Arxiv

2+阅读 · 2023年5月22日

Automatic Code Summarization via ChatGPT: How Far Are We?

Arxiv

0+阅读 · 2023年5月22日

Chain-of-thought prompting for responding to in-depth dialogue questions with LLM

Arxiv

0+阅读 · 2023年5月19日

RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought

Arxiv

0+阅读 · 2023年5月19日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

62+阅读 · 2023年3月29日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

相关VIP内容

如何向ChatGPT问问题？这本手册《提问的艺术—让ChatGPT给出高质量答案》，提示工程技术全面指南，52页pdf

如何向ChatGPT问问题？这本手册《提问的艺术—让ChatGPT给出高质量答案》，提示工程技术全面指南，52页pdf

专知会员服务

195+阅读 · 2023年4月12日

关于大型语言模型需要知道的8件事

关于大型语言模型需要知道的8件事

专知会员服务

27+阅读 · 2023年4月3日

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

ChatGPT背后的大模型如何做推理？港中文等最新《自然语言推理》综述详述预训练语言模型推理方法

ChatGPT背后的大模型如何做推理？港中文等最新《自然语言推理》综述详述预训练语言模型推理方法

专知会员服务

116+阅读 · 2023年3月29日

语言模型如何做算法推理？Google Hattie Zhou《通过语境学习来教算法推理》，附Slides与论文

语言模型如何做算法推理？Google Hattie Zhou《通过语境学习来教算法推理》，附Slides与论文

专知会员服务

27+阅读 · 2023年3月10日

ChatGPT如何work的？最新《大型语言模型》综述，51页slides

ChatGPT如何work的？最新《大型语言模型》综述，51页slides

专知会员服务

162+阅读 · 2023年2月28日

现在大火的“In-context Learning”是什么？北大等最新《语境学习ICL》综述论文，详述ICL进展、挑战和方向

现在大火的“In-context Learning”是什么？北大等最新《语境学习ICL》综述论文，详述ICL进展、挑战和方向

专知会员服务

41+阅读 · 2023年1月3日

用蛋白语言模型改进蛋白复合物预测

用蛋白语言模型改进蛋白复合物预测

专知会员服务

10+阅读 · 2022年9月25日

【Contextual Embedding】什么时候上下文嵌入值得使用?

【Contextual Embedding】什么时候上下文嵌入值得使用?

专知会员服务

16+阅读 · 2020年8月2日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

热门VIP内容

开通专知VIP会员享更多权益服务

新型数字杀伤链：理解综合战术网络对野战炮兵体系的能力与效益

《对抗环境中运用数字孪生技术优化预测性维护与后勤保障》2025最新93页

《任务式指挥十六个案例研究》232页

《幻觉还是事实：国防大型语言模型的可信度评估研究》2025最新109页

相关资讯

揭秘ChatGPT情感对话能力

揭秘ChatGPT情感对话能力

专知

16+阅读 · 2023年4月9日

首次：微软用GPT-4做大模型指令微调，新任务零样本性能再提升

首次：微软用GPT-4做大模型指令微调，新任务零样本性能再提升

机器之心

7+阅读 · 2023年4月9日

关于AI大模型的一点思考和讨论

关于AI大模型的一点思考和讨论

极市平台

3+阅读 · 2022年11月7日

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

机器之心

1+阅读 · 2022年10月30日

DeepMind：为什么GPT能为你写诗？

DeepMind：为什么GPT能为你写诗？

新智元

1+阅读 · 2022年6月3日

赛尔原创@ACL 2022 | e-CARE: 可解释的因果推理数据集

赛尔原创@ACL 2022 | e-CARE: 可解释的因果推理数据集

哈工大SCIR

1+阅读 · 2022年5月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective

Arxiv

0+阅读 · 2023年5月24日

Exploring Chain-of-Thought Style Prompting for Text-to-SQL

Arxiv

0+阅读 · 2023年5月23日

Active Prompting with Chain-of-Thought for Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Probing in Context: Toward Building Robust Classifiers via Probing Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Distilling ChatGPT for Explainable Automated Student Answer Assessment

Arxiv

2+阅读 · 2023年5月22日

Automatic Code Summarization via ChatGPT: How Far Are We?

Arxiv

0+阅读 · 2023年5月22日

Chain-of-thought prompting for responding to in-depth dialogue questions with LLM

Arxiv

0+阅读 · 2023年5月19日

RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought

Arxiv

0+阅读 · 2023年5月19日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

62+阅读 · 2023年3月29日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

相关基金

LncRNA介导肿瘤相关巨噬细胞促进乳腺癌转移分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Chemerin通过调节p38MAPK通路参与动脉粥样硬化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

CXCR4与IL-35联合基因修饰间充质干细胞对溃疡性结肠炎局部免疫平衡的调节及清热燥湿凉血方的协同作用

国家自然科学基金

0+阅读 · 2014年12月31日

基于Wnt/beta-catenin信号通路研究西黄丸防治乳腺癌的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

车轮荷载作用下沥青路面孔隙水压力波动传导效应

国家自然科学基金

0+阅读 · 2013年12月31日

有关四阶Monge-Ampere型方程若干问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

听力基因prestin在回声定位哺乳动物中的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于4G-OFDM体制的GEO卫星移动通信系统星载交换关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

EAST面对等离子体钨瓦块缝隙结构对其燃料滞留和热负荷性能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

c-Myc及Cyclin A2诱导豚鼠耳蜗前体细胞增殖的实验研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员