We present a novel online algorithm that learns the essence of each dimension in word embeddings by minimizing the within-group distance of contextualized embedding groups. Three state-of-the-art neural-based language models are used, Flair, ELMo, and BERT, to generate contextualized word embeddings such that different embeddings are generated for the same word type, which are grouped by their senses manually annotated in the SemCor dataset. We hypothesize that not all dimensions are equally important for downstream tasks so that our algorithm can detect unessential dimensions and discard them without hurting the performance. To verify this hypothesis, we first mask dimensions determined unessential by our algorithm, apply the masked word embeddings to a word sense disambiguation task (WSD), and compare its performance against the one achieved by the original embeddings. Several KNN approaches are experimented to establish strong baselines for WSD. Our results show that the masked word embeddings do not hurt the performance and can improve it by 3%. Our work can be used to conduct future research on the interpretability of contextualized embeddings.
翻译:我们提出了一个新颖的在线算法,通过最大限度地减少背景化嵌入群群群内部距离,来了解单词嵌入中每个维度的本质。 使用了三种最先进的神经基语言模型, Flair、 ELMO 和 BERT, 以产生符合背景的字嵌入, 从而产生不同的单词型嵌入, 这些单词型由感官手工生成, 在 SemCor 数据集中附加注释。 我们假设并非所有维度都对下游任务同等重要, 以便我们的算法能够检测非基本维度, 并在不伤害性能的情况下丢弃它们。 为了验证这一假设, 我们首先使用由算法确定的非必要维度的遮蔽维度, 将隐蔽字嵌入到感错乱的字型任务( WSD) 上, 并将其性能与原始嵌入的功能进行比较。 几个 KNN 方法正在实验, 以建立强大的WDD 基准。 我们的结果表明, 隐含字嵌入不会伤害性, 并且能够将它改进 3% 。 我们的工作可以用来进行未来关于背景化嵌入的可解释性的研究 。