用于解释上下文文字嵌入内容的递增感敏量培训 (Incremental Sense Weight Training for the Interpretation of Contextualized Word Embeddings)

We present a novel online algorithm that learns the essence of each dimension in word embeddings by minimizing the within-group distance of contextualized embedding groups. Three state-of-the-art neural-based language models are used, Flair, ELMo, and BERT, to generate contextualized word embeddings such that different embeddings are generated for the same word type, which are grouped by their senses manually annotated in the SemCor dataset. We hypothesize that not all dimensions are equally important for downstream tasks so that our algorithm can detect unessential dimensions and discard them without hurting the performance. To verify this hypothesis, we first mask dimensions determined unessential by our algorithm, apply the masked word embeddings to a word sense disambiguation task (WSD), and compare its performance against the one achieved by the original embeddings. Several KNN approaches are experimented to establish strong baselines for WSD. Our results show that the masked word embeddings do not hurt the performance and can improve it by 3%. Our work can be used to conduct future research on the interpretability of contextualized embeddings.

翻译：我们提出了一个新颖的在线算法,通过最大限度地减少背景化嵌入群群群内部距离,来了解单词嵌入中每个维度的本质。使用了三种最先进的神经基语言模型, Flair、 ELMO 和 BERT, 以产生符合背景的字嵌入, 从而产生不同的单词型嵌入, 这些单词型由感官手工生成, 在 SemCor 数据集中附加注释。我们假设并非所有维度都对下游任务同等重要, 以便我们的算法能够检测非基本维度, 并在不伤害性能的情况下丢弃它们。为了验证这一假设, 我们首先使用由算法确定的非必要维度的遮蔽维度, 将隐蔽字嵌入到感错乱的字型任务( WSD) 上, 并将其性能与原始嵌入的功能进行比较。几个 KNN 方法正在实验, 以建立强大的WDD 基准。我们的结果表明, 隐含字嵌入不会伤害性, 并且能够将它改进 3% 。我们的工作可以用来进行未来关于背景化嵌入的可解释性的研究。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

【KDD2020】基于节点-边缘协同解纠缠的可解释深图生成，Interpretable Deep Graph Generation with Node-edge Co-disentanglement

专知会员服务

30+阅读 · 2020年6月11日

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

34+阅读 · 2020年3月28日

《可解释的机器学习-interpretable-ml》238页pdf

专知会员服务

194+阅读 · 2020年2月24日