LUNA:通过数字插件和预培训在变形器上增加数量的语言理解</s> (LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training)

Transformers are widely used in NLP tasks. However, current approaches to leveraging transformers to understand language expose one weak spot: Number understanding. In some scenarios, numbers frequently occur, especially in semi-structured data like tables. But current approaches to rich-number tasks with transformer-based language models abandon or lose some of the numeracy information - e.g., breaking numbers into sub-word tokens - which leads to many number-related errors. In this paper, we propose the LUNA framework which improves the numerical reasoning and calculation capabilities of transformer-based language models. With the number plugin of NumTok and NumBed, LUNA represents each number as a whole to model input. With number pre-training, including regression loss and model distillation, LUNA bridges the gap between number and vocabulary embeddings. To the best of our knowledge, this is the first work that explicitly injects numeracy capability into language models using Number Plugins. Besides evaluating toy models on toy tasks, we evaluate LUNA on three large-scale transformer models (RoBERTa, BERT, TabBERT) over three different downstream tasks (TATQA, TabFact, CrediTrans), and observe the performances of language models are constantly improved by LUNA. The augmented models also improve the official baseline of TAT-QA (EM: 50.15 -> 59.58) and achieve SOTA performance on CrediTrans (F1 = 86.17).

翻译：在NLP任务中广泛使用变压器。然而,目前利用变压器来理解语言的方法暴露了一个薄弱点:数字理解。在某些情景中,数字经常出现,特别是在像表格这样的半结构化数据中。但目前采用变压器语言模型的丰富数量任务的方法放弃或丢失了一些算术信息----例如,将数字破碎成子字符号----这导致许多与数字有关的错误。在本文件中,我们提议LUNA框架,改进变压器语言模型的数字推理和计算能力。在 NumTok 和 NumBed 的插件中,LUNA 代表了每个数字, 特别是在像表那样的半结构化数据中。但是,目前采用以变压器为基础的变压器模型(ROPERTA、BERTBERT) 经常出现数字, 包括回归损失和模型蒸馏,LUNA将数字和词汇嵌入的间隔间隔断。根据我们的知识,这是将算能力明确引入语言模型使用数字 Plugins Plugins。除了评估玩具任务中的微模型,我们还评估三种大型变压变压模型(ROTA、BERTERQ、TABERAT1、TA+TA、SLA的升级的SALA、SALA、SALA、SALQ、SA的升级的SA、SUA的升级的升级式、升级式、TA、TA的SA、升级的SA、升级的SOVA、升级式的升级式、升级式、TA、TA、TA、升级式、升级的SVA、升级的SVA、TA、TA、TA、升级式、升级的SVA、升级的SVA、升级的S-Q、升级的S-Q、升级的SVA、升级的SVA、升级的SVA、升级的SVA-Q、升级的SVA、升级的SVA、升级的SVA、升级式式式式、升级的SA、升级的SALQ、升级的SA、升级的SA、升级的SALA、升级的SA、升级的SA、升级式、升级的SA、升级的SA、升级的SALA、升级的SA、</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日