机器和人类对于科学理解的基准 (Towards a Benchmark for Scientific Understanding in Humans and Machines) - 专知论文

会员服务 ·

0

基准 · 质量控制 · 性能评估 · 效用 · 智能系统 ·

2023 年 4 月 20 日

Towards a Benchmark for Scientific Understanding in Humans and Machines

翻译：机器和人类对于科学理解的基准

Kristian Gonzalez Barman,Sascha Caron,Tom Claassen,Henk de Regt

Scientific understanding is a fundamental goal of science, allowing us to explain the world. There is currently no good way to measure the scientific understanding of agents, whether these be humans or Artificial Intelligence systems. Without a clear benchmark, it is challenging to evaluate and compare different levels of and approaches to scientific understanding. In this Roadmap, we propose a framework to create a benchmark for scientific understanding, utilizing tools from philosophy of science. We adopt a behavioral notion according to which genuine understanding should be recognized as an ability to perform certain tasks. We extend this notion by considering a set of questions that can gauge different levels of scientific understanding, covering information retrieval, the capability to arrange information to produce an explanation, and the ability to infer how things would be different under different circumstances. The Scientific Understanding Benchmark (SUB), which is formed by a set of these tests, allows for the evaluation and comparison of different approaches. Benchmarking plays a crucial role in establishing trust, ensuring quality control, and providing a basis for performance evaluation. By aligning machine and human scientific understanding we can improve their utility, ultimately advancing scientific understanding and helping to discover new insights within machines.

翻译：科学理解是科学的根本目标，它让我们能够解释世界。目前还没有一种好的方法来衡量机器或人类智能系统的科学理解水平。缺乏清晰的基准，很难评估和比较不同水平的和不同方法的科学理解。在本路线图中，我们提出了一个框架，以哲学科学工具为基础，创建科学理解基准。我们采用一种行为性概念，根据这种概念，真正的理解应该被认为是执行某些任务的能力。我们通过考虑一组问题，扩展了这种概念，这些问题可以衡量不同水平的科学理解，包括信息检索、能够整理信息以产生解释的能力，以及在不同情况下推断事物会有怎样不同的能力。科学理解基准 (SUB)由这些测试组成，可以评估和比较不同的方法。标准化在建立信任、确保质量控制和提供性能评估基础方面起着至关重要的作用。通过使机器和人类的科学理解相一致，我们可以提高它们的效用，最终推进科学理解，帮助发现机器内的新见解。

0

相关内容

多模态认知计算

多模态认知计算

专知会员服务

180+阅读 · 2022年9月16日

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

专知会员服务

31+阅读 · 2022年3月11日

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

专知会员服务

19+阅读 · 2022年3月1日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【机器推理可解释性】Machine Reasoning Explainability

【机器推理可解释性】Machine Reasoning Explainability

专知会员服务

35+阅读 · 2020年9月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

专知会员服务

22+阅读 · 2019年11月15日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

机器之心

1+阅读 · 2022年10月30日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

重磅开讲：图灵奖得主—— Joseph Sifakis

重磅开讲：图灵奖得主—— Joseph Sifakis

THU数据派

0+阅读 · 2022年6月13日

模块化的机器学习系统就够了吗？Bengio师生告诉你答案

模块化的机器学习系统就够了吗？Bengio师生告诉你答案

机器之心

0+阅读 · 2022年6月8日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于新型聚合物/无机杂化空穴传输材料的高效钙钛矿太阳能电池研究

国家自然科学基金

0+阅读 · 2015年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属氧化物界面的自旋极化电子输运研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cu/Al复合带固-液铸轧电流强化复合成形技术基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

下白垩统热河群鸟类化石形态和分类学研究

国家自然科学基金

0+阅读 · 2012年12月31日

聚合物/氧化物杂化太阳电池中多元复合界面结构特性对光电转换过程的影响

国家自然科学基金

0+阅读 · 2012年12月31日

肝细胞肝癌中抑癌基因DLC1表达沉默的遗传学与表观遗传学机制

国家自然科学基金

0+阅读 · 2012年12月31日

量子discord及其在量子计算中的研究

国家自然科学基金

1+阅读 · 2011年12月31日

黑曲霉（Aspergillus niger）对含钾矿物的生物风化与调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

神经病理性疼痛的脑结构和功能网络研究

国家自然科学基金

0+阅读 · 2010年12月31日

A Study of Situational Reasoning for Traffic Understanding

Arxiv

0+阅读 · 2023年6月5日

AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap

Arxiv

0+阅读 · 2023年6月2日

Towards In-context Scene Understanding

Arxiv

0+阅读 · 2023年6月2日

Toward an Ethics of AI Belief

Arxiv

0+阅读 · 2023年6月2日

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Arxiv

0+阅读 · 2023年6月2日

Benchmark dataset and instance generator for Real-World Three-Dimensional Bin Packing Problems

Arxiv

0+阅读 · 2023年6月2日

What-is and How-to for Fairness in Machine Learning: A Survey, Reflection, and Perspective

Arxiv

0+阅读 · 2023年6月2日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

25+阅读 · 2020年10月20日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VIP会员

文章信息

相关主题

相关VIP内容

多模态认知计算

多模态认知计算

专知会员服务

180+阅读 · 2022年9月16日

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

专知会员服务

31+阅读 · 2022年3月11日

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

专知会员服务

19+阅读 · 2022年3月1日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【机器推理可解释性】Machine Reasoning Explainability

【机器推理可解释性】Machine Reasoning Explainability

专知会员服务

35+阅读 · 2020年9月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

专知会员服务

22+阅读 · 2019年11月15日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新型数字杀伤链：理解综合战术网络对野战炮兵体系的能力与效益

《对抗环境中运用数字孪生技术优化预测性维护与后勤保障》2025最新93页

《任务式指挥十六个案例研究》232页

《幻觉还是事实：国防大型语言模型的可信度评估研究》2025最新109页

相关资讯

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

机器之心

1+阅读 · 2022年10月30日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

重磅开讲：图灵奖得主—— Joseph Sifakis

重磅开讲：图灵奖得主—— Joseph Sifakis

THU数据派

0+阅读 · 2022年6月13日

模块化的机器学习系统就够了吗？Bengio师生告诉你答案

模块化的机器学习系统就够了吗？Bengio师生告诉你答案

机器之心

0+阅读 · 2022年6月8日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Study of Situational Reasoning for Traffic Understanding

Arxiv

0+阅读 · 2023年6月5日

AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap

Arxiv

0+阅读 · 2023年6月2日

Towards In-context Scene Understanding

Arxiv

0+阅读 · 2023年6月2日

Toward an Ethics of AI Belief

Arxiv

0+阅读 · 2023年6月2日

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Arxiv

0+阅读 · 2023年6月2日

Benchmark dataset and instance generator for Real-World Three-Dimensional Bin Packing Problems

Arxiv

0+阅读 · 2023年6月2日

What-is and How-to for Fairness in Machine Learning: A Survey, Reflection, and Perspective

Arxiv

0+阅读 · 2023年6月2日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

25+阅读 · 2020年10月20日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

相关基金

基于新型聚合物/无机杂化空穴传输材料的高效钙钛矿太阳能电池研究

国家自然科学基金

0+阅读 · 2015年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属氧化物界面的自旋极化电子输运研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cu/Al复合带固-液铸轧电流强化复合成形技术基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

下白垩统热河群鸟类化石形态和分类学研究

国家自然科学基金

0+阅读 · 2012年12月31日

聚合物/氧化物杂化太阳电池中多元复合界面结构特性对光电转换过程的影响

国家自然科学基金

0+阅读 · 2012年12月31日

肝细胞肝癌中抑癌基因DLC1表达沉默的遗传学与表观遗传学机制

国家自然科学基金

0+阅读 · 2012年12月31日

量子discord及其在量子计算中的研究

国家自然科学基金

1+阅读 · 2011年12月31日

黑曲霉（Aspergillus niger）对含钾矿物的生物风化与调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

神经病理性疼痛的脑结构和功能网络研究

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员