衡量大规模多任务语言理解 (Measuring Massive Multitask Language Understanding) - 专知论文

会员服务 ·

0

模型评估 · MoDELS · 可理解性 · Extensibility · 可辨认的 ·

2021 年 1 月 12 日

Measuring Massive Multitask Language Understanding

翻译：衡量大规模多任务语言理解

Dan Hendrycks,Collin Burns,Steven Basart,Andy Zou,Mantas Mazeika,Dawn Song,Jacob Steinhardt

from arxiv, ICLR 2021; the test and code is available at https://github.com/hendrycks/test

We propose a new test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. To attain high accuracy on this test, models must possess extensive world knowledge and problem solving ability. We find that while most recent models have near random-chance accuracy, the very largest GPT-3 model improves over random chance by almost 20 percentage points on average. However, on every one of the 57 tasks, the best models still need substantial improvements before they can reach expert-level accuracy. Models also have lopsided performance and frequently do not know when they are wrong. Worse, they still have near-random accuracy on some socially important subjects such as morality and law. By comprehensively evaluating the breadth and depth of a model's academic and professional understanding, our test can be used to analyze models across many tasks and to identify important shortcomings.

翻译：我们建议进行一项新的测试,以测量文本模型的多任务准确性。测试涵盖57项任务, 包括基础数学、美国历史、计算机科学、法律等等。要实现这一测试的高度精确性, 模型必须拥有广泛的世界知识和解决问题的能力。我们发现, 虽然最近的模型几乎具有随机精确性, 但最大的GPT-3模型平均比随机概率高出近20个百分点。然而, 在57项任务中, 最佳模型在达到专家级准确性之前, 仍然需要大幅改进。模型的性能也偏斜, 并且经常不知道何时出错。更糟糕的是, 它们在某些社会重要主题, 如道德和法律上仍然拥有近乎随机的准确性。通过全面评估模型学术和专业理解的广度和深度, 我们的测试可以用来分析跨越许多任务的模型, 并找出重要的缺陷。

0

相关内容

模型评估

机器学习系统设计系统评估标准

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

276+阅读 · 2020年11月26日

2020数据工程师成长路线图

专知会员服务

38+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

75+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

122+阅读 · 2020年7月18日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

131+阅读 · 2020年5月30日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

已删除

将门创投

6+阅读 · 2019年1月11日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

半监督多任务学习：Semisupervised Multitask Learning

半监督多任务学习：Semisupervised Multitask Learning

我爱读PAMI

18+阅读 · 2018年4月29日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond

Arxiv

0+阅读 · 2021年3月9日

Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models

Arxiv

0+阅读 · 2021年3月9日

An Attentive Survey of Attention Models

An Attentive Survey of Attention Models

Arxiv

43+阅读 · 2020年12月15日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

111+阅读 · 2020年3月18日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Understanding Attention and Generalization in Graph Neural Networks

Arxiv

3+阅读 · 2019年10月28日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

QuAC : Question Answering in Context

QuAC : Question Answering in Context

Arxiv

4+阅读 · 2018年8月21日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

276+阅读 · 2020年11月26日

2020数据工程师成长路线图

专知会员服务

38+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

75+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

122+阅读 · 2020年7月18日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

131+阅读 · 2020年5月30日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

已删除

将门创投

6+阅读 · 2019年1月11日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

半监督多任务学习：Semisupervised Multitask Learning

半监督多任务学习：Semisupervised Multitask Learning

我爱读PAMI

18+阅读 · 2018年4月29日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond

Arxiv

0+阅读 · 2021年3月9日

Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models

Arxiv

0+阅读 · 2021年3月9日

An Attentive Survey of Attention Models

An Attentive Survey of Attention Models

Arxiv

43+阅读 · 2020年12月15日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

111+阅读 · 2020年3月18日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Understanding Attention and Generalization in Graph Neural Networks

Arxiv

3+阅读 · 2019年10月28日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

QuAC : Question Answering in Context

QuAC : Question Answering in Context

Arxiv

4+阅读 · 2018年8月21日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

微信扫码咨询专知VIP会员