近年来,研究人员通过文本上下文信息分析获得更好的词向量。ELMo是其中的翘楚,在多个任务、多个数据集上都有显著的提升。所以,它是目前最好用的词向量,the-state-of-the-art的方法。这篇文章发表在2018年的NAACL上,outstanding paper award。下面就简单介绍一下这个“神秘”的词向量模型。

VIP内容

在ELMo、BERT和GPT-2中,上层比下层产生更多特定于上下文的表示。但是,这些模型对单词的上下文环境非常不同:在调整了各向异性之后,ELMo中相同句子中的单词之间的相似性最高,而GPT-2中几乎不存在。

平均而言,在一个词的上下文化表示中,只有不到5%的差异可以用静态嵌入来解释。因此,即使在最佳情况下,静态词嵌入也不能很好地替代上下文词。不过,上下文表示可以用来创建一种更强大的静态嵌入类型:BERT的低层上下文表示的主要组件比GloVe和FastText要好得多!如果你有兴趣沿着这些线阅读更多,看看:

The Dark Secrets of BERT (Rogers et al., 2019) Evolution of Representations in the Transformer (Voita et al., 2019) Cross-Lingual Alignment of Contextual Word Embeddings (Schuster et al., 2019) The Illustrated BERT, ELMo, and co. (Alammar, 2019)

成为VIP会员查看完整内容
0
21

最新论文

Evaluating model robustness is critical when developing trustworthy models not only to gain deeper understanding of model behavior, strengths, and weaknesses, but also to develop future models that are generalizable and robust across expected environments a model may encounter in deployment. In this paper we present a framework for measuring model robustness for an important but difficult text classification task - deceptive news detection. We evaluate model robustness to out-of-domain data, modality-specific features, and languages other than English. Our investigation focuses on three type of models: LSTM models trained on multiple datasets(Cross-Domain), several fusion LSTM models trained with images and text and evaluated with three state-of-the-art embeddings, BERT ELMo, and GloVe (Cross-Modality), and character-level CNN models trained on multiple languages (Cross-Language). Our analyses reveal a significant drop in performance when testing neural models on out-of-domain data and non-English languages that may be mitigated using diverse training data. We find that with additional image content as input, ELMo embeddings yield significantly fewer errors compared to BERT orGLoVe. Most importantly, this work not only carefully analyzes deception model robustness but also provides a framework of these analyses that can be applied to new models or extended datasets in the future.

0
0
下载
预览
Top