EMNLP 2019年消费了多少咖啡? (How Much Coffee Was Consumed During EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI) - 专知论文

会员服务 ·

0

FPS · 估计/估计量 · Extensibility · EMNLP · 语言模型化 ·

2021 年 12 月 21 日

How Much Coffee Was Consumed During EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI

翻译：EMNLP 2019年消费了多少咖啡?

Ashwin Kalyan,Abhinav Kumar,Arjun Chandrasekaran,Ashish Sabharwal,Peter Clark

from arxiv, Accepted for publication at EMNLP 2021, 11 pages, 5 tables, 4 figures

Many real-world problems require the combined application of multiple reasoning abilities employing suitable abstractions, commonsense knowledge, and creative synthesis of problem-solving strategies. To help advance AI systems towards such capabilities, we propose a new reasoning challenge, namely Fermi Problems (FPs), which are questions whose answers can only be approximately estimated because their precise computation is either impractical or impossible. For example, "How much would the sea level rise if all ice in the world melted?" FPs are commonly used in quizzes and interviews to bring out and evaluate the creative reasoning abilities of humans. To do the same for AI systems, we present two datasets: 1) A collection of 1k real-world FPs sourced from quizzes and olympiads; and 2) a bank of 10k synthetic FPs of intermediate complexity to serve as a sandbox for the harder real-world challenge. In addition to question answer pairs, the datasets contain detailed solutions in the form of an executable program and supporting facts, helping in supervision and evaluation of intermediate steps. We demonstrate that even extensively fine-tuned large scale language models perform poorly on these datasets, on average making estimates that are off by two orders of magnitude. Our contribution is thus the crystallization of several unsolved AI problems into a single, new challenge that we hope will spur further advances in building systems that can reason.

翻译：许多现实世界问题要求综合应用多种推理能力,采用适当的抽象、常识知识和解决问题战略的创造性合成。为了帮助推进AI系统,我们提出了一个新的推理挑战,即Fermi问题(Fermi Maisses),这些问题的答案只能粗略估计,因为精确的计算是不切实际的或不可能的。例如,“如果世界上所有冰层都融化了,海平面上升需要多少?”Fests通常在测验和访谈中使用详细的解决方案,以显示和评估人类的创造性推理能力。为了对AI系统也这样做,我们提出了两个数据集:(1) 大量来自测验和奥lympiads的1k真实世界的FPs;和(2) 中间复杂问题的10k合成Fests,作为更困难的现实世界挑战的沙箱。除了问答配对外,数据集还包含详细的解决方案,其形式是可执行的方案,支持事实,帮助监督和评估中间步骤。我们展示了两个非常精细的大规模语言模型,它们来自测验和奥秘的测试;以及两个规模的大规模语言模型,因此,在构建这些单一的系统时,我们无法作出一些平均的挑战。

0

相关内容

FPS

【海淀高科技高成长项目报告——创新驱动高质量成长】2021.12，德勤

【海淀高科技高成长项目报告——创新驱动高质量成长】2021.12，德勤

专知会员服务

11+阅读 · 2022年2月18日

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

专知会员服务

32+阅读 · 2021年11月30日

【USC2021】常识推理，47页ppt，Commonsense Reasoning in the Wild

专知会员服务

32+阅读 · 2021年10月9日

ICLR2021有什么值得关注的投稿？这些高赞论文先睹为快

专知会员服务

45+阅读 · 2020年10月5日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

70+阅读 · 2020年8月2日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

8+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

人工智能 | NAACL-HLT 2019等国际会议信息6条

人工智能 | NAACL-HLT 2019等国际会议信息6条

Call4Papers

4+阅读 · 2018年10月30日

已删除

清华大学研究生教育

3+阅读 · 2018年6月30日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

(OpenCV/Keras)用手势控制的计算器

(OpenCV/Keras)用手势控制的计算器

机器学习研究会

3+阅读 · 2018年3月4日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

Knowledge Base Question Answering by Case-based Reasoning over Subgraphs

Arxiv

0+阅读 · 2022年2月22日

Algorithmic Concept-based Explainable Reasoning

Arxiv

8+阅读 · 2021年7月15日

Bayesian Persuasion in Sequential Decision-Making

Arxiv

3+阅读 · 2021年6月9日

Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning

Arxiv

3+阅读 · 2020年10月20日

Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering

Arxiv

3+阅读 · 2020年9月18日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Neural Module Networks for Reasoning over Text

Neural Module Networks for Reasoning over Text

Arxiv

9+阅读 · 2019年12月10日

Improving Question Answering by Commonsense-Based Pre-Training

Arxiv

4+阅读 · 2019年3月1日

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Arxiv

4+阅读 · 2018年10月4日

CoQA: A Conversational Question Answering Challenge

CoQA: A Conversational Question Answering Challenge

Arxiv

7+阅读 · 2018年8月21日

VIP会员

文章信息

相关主题

估计/估计量

语言模型化

相关VIP内容

【海淀高科技高成长项目报告——创新驱动高质量成长】2021.12，德勤

【海淀高科技高成长项目报告——创新驱动高质量成长】2021.12，德勤

专知会员服务

11+阅读 · 2022年2月18日

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

专知会员服务

32+阅读 · 2021年11月30日

【USC2021】常识推理，47页ppt，Commonsense Reasoning in the Wild

专知会员服务

32+阅读 · 2021年10月9日

ICLR2021有什么值得关注的投稿？这些高赞论文先睹为快

专知会员服务

45+阅读 · 2020年10月5日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

70+阅读 · 2020年8月2日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

8+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

热门VIP内容

相关资讯

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

人工智能 | NAACL-HLT 2019等国际会议信息6条

人工智能 | NAACL-HLT 2019等国际会议信息6条

Call4Papers

4+阅读 · 2018年10月30日

已删除

清华大学研究生教育

3+阅读 · 2018年6月30日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

(OpenCV/Keras)用手势控制的计算器

(OpenCV/Keras)用手势控制的计算器

机器学习研究会

3+阅读 · 2018年3月4日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

相关论文

Knowledge Base Question Answering by Case-based Reasoning over Subgraphs

Arxiv

0+阅读 · 2022年2月22日

Algorithmic Concept-based Explainable Reasoning

Arxiv

8+阅读 · 2021年7月15日

Bayesian Persuasion in Sequential Decision-Making

Arxiv

3+阅读 · 2021年6月9日

Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning

Arxiv

3+阅读 · 2020年10月20日

Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering

Arxiv

3+阅读 · 2020年9月18日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Neural Module Networks for Reasoning over Text

Neural Module Networks for Reasoning over Text

Arxiv

9+阅读 · 2019年12月10日

Improving Question Answering by Commonsense-Based Pre-Training

Arxiv

4+阅读 · 2019年3月1日

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Arxiv

4+阅读 · 2018年10月4日

CoQA: A Conversational Question Answering Challenge

CoQA: A Conversational Question Answering Challenge

Arxiv

7+阅读 · 2018年8月21日

微信扫码咨询专知VIP会员