关注嵌入单词 (Attention Word Embedding)

Word embedding models learn semantically rich vector representations of words and are widely used to initialize natural processing language (NLP) models. The popular continuous bag-of-words (CBOW) model of word2vec learns a vector embedding by masking a given word in a sentence and then using the other words as a context to predict it. A limitation of CBOW is that it equally weights the context words when making a prediction, which is inefficient, since some words have higher predictive value than others. We tackle this inefficiency by introducing the Attention Word Embedding (AWE) model, which integrates the attention mechanism into the CBOW model. We also propose AWE-S, which incorporates subword information. We demonstrate that AWE and AWE-S outperform the state-of-the-art word embedding models both on a variety of word similarity datasets and when used for initialization of NLP models.

翻译：嵌入模式的字词会学习字词的精度丰富的矢量表达,并被广泛用于初始化自然处理语言(NLP)模型。流行的Word2vec(CBOW)模式(CBOW)通过在句子中隐藏一个单词来学习一个矢量嵌入,然后用其他词作为上下文来预测它。 CBOW的局限性在于它在作出预测时对上下文单词同样权重,因为有些字词的预测值高于其他词。我们通过引入将注意力机制纳入CBOW模式的注意字嵌入模式(AWE)模型来解决这一效率低下问题。我们还提出了AWE-S(包含子字信息)建议。我们表明,AWE和AWE-S(AWE-S)在各种类似词的数据集上,以及在用于初始化NLP模型时,都超越了最先进的嵌入模式。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

133+阅读 · 2020年5月30日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

91+阅读 · 2020年4月18日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

84+阅读 · 2020年1月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日