不可能的三角形:预科语言模型下一步是什么? (Impossible Triangle: What's Next for Pre-trained Language Models?)

Recent development of large-scale pre-trained language models (PLM) have significantly improved the capability of models in various NLP tasks, in terms of performance after task-specific fine-tuning and zero-shot / few-shot learning. However, many of such models come with a dauntingly huge size that few institutions can afford to pre-train, fine-tune or even deploy, while moderate-sized models usually lack strong generalized few-shot learning capabilities. In this paper, we first elaborate the current obstacles of using PLM models in terms of the Impossible Triangle: 1) moderate model size, 2) state-of-the-art few-shot learning capability, and 3) state-of-the-art fine-tuning capability. We argue that all existing PLM models lack one or more properties from the Impossible Triangle. To remedy these missing properties of PLMs, various techniques have been proposed, such as knowledge distillation, data augmentation and prompt learning, which inevitably brings additional work to the application of PLMs in real scenarios. We then offer insights into future research directions of PLMs to achieve the Impossible Triangle, and break down the task into several key phases.

翻译：最近开发的大规模预先培训语言模式(PLM)大大提高了各种NLP任务模式的能力,在具体任务的微调和零点/微小的学习后,大大提高了模型在各种NLP任务中的业绩能力,然而,许多这样的模式都具有巨大规模,很少有机构能够负担得起预培训、微调甚至部署,而中等规模的模式通常缺乏强大的普遍、少见的学习能力。在本文件中,我们首先阐述了在不可能的三角方面使用PLM模式的现有障碍:(1) 中等规模的模型,(2) 最先进的微小的学习能力,(3) 最先进的微调能力。我们说,所有现有的PLM模型都缺乏不可能的三角洲的一个或多个特性。为了补救PLMS的这些缺失特性,提出了各种技术,例如知识蒸馏、数据增强和迅速学习,这不可避免地给在现实情景中应用PLMS带来更多的工作。我们然后对PLMS的未来研究方向提出见解,以便实现不可能的三角洲,并将任务分成几个关键阶段。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/