In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.
翻译:去年,新的培训前和转让学习模式和方法促使一系列语言理解任务取得了显著的绩效改进。一年多前引入的GLUE基准提供了一个单一数字的衡量标准,概括了不同任务组合的进展情况,但基准的绩效最近超过了非专家人的水平,表明可供进一步研究的总部空间有限。在本文中,我们介绍了SUperGLUE,一个仿照GLUE的新基准,有一套新的更难理解语言的任务,一个软件工具包,和一个公共领导板。超级GLUE可以在超级.gluebenchmark.com上找到。