语文代表经常网络图 (Pre-Training a Graph Recurrent Network for Language Representation)

Transformer-based pre-trained models have gained much advance in recent years, becoming one of the most important backbones in natural language processing. Recent work shows that the attention mechanism inside Transformer may not be necessary, both convolutional neural networks and multi-layer perceptron based models have also been investigated as Transformer alternatives. In this paper, we consider a graph recurrent network for language model pre-training, which builds a graph structure for each sequence with local token-level communications, together with a sentence-level representation decoupled from other tokens. The original model performs well in domain-specific text classification under supervised training, however, its potential in learning transfer knowledge by self-supervised way has not been fully exploited. We fill this gap by optimizing the architecture and verifying its effectiveness in more general language understanding tasks, for both English and Chinese languages. As for model efficiency, instead of the quadratic complexity in Transformer-based models, our model has linear complexity and performs more efficiently during inference. Moreover, we find that our model can generate more diverse outputs with less contextualized feature redundancy than existing attention-based models.

翻译：近些年来,基于变异器的预培训模型取得了长足的进步,成为自然语言处理中最重要的支柱之一。最近的工作表明,变异器内部的注意机制也许没有必要,变异神经网络和多层光谱模型也作为变异器替代品被调查。在本文中,我们考虑为语言模型预培训建立一个图形经常性网络,用当地象征性的通信为每个序列建立一个图形结构,同时从其他符号中分离出一个判决级别代表。但是,在受监督的培训中,原始模型在特定域文本分类方面表现良好,但在以自我监督的方式学习知识转移方面的潜力没有得到充分利用。我们通过优化结构并用更一般的语言理解英语和中文的任务核实其有效性来填补这一空白。关于模型效率,而不是基于变异器模型的四分复杂度,我们的模型具有线性复杂性,在推断过程中表现得更有效率。此外,我们发现,我们的模型可以产生比现有关注模型更不那么具有背景的特点重复的更多样化的产出。

相关内容

MoDELS

关注 30

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

48+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

72+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日