We propose a new global entity disambiguation (ED) model based on contextualized embeddings of words and entities. Our model is based on a bidirectional transformer encoder (i.e., BERT) and produces contextualized embeddings for words and entities in the input text. The model is trained using a new masked entity prediction task that aims to train the model by predicting randomly masked entities in entity-annotated texts obtained from Wikipedia. We further extend the model by solving ED as a sequential decision task to capture global contextual information. We evaluate our model using six standard ED datasets and achieve new state-of-the-art results on all but one dataset.
翻译:我们根据文字和实体的背景嵌入,提出了一个新的全球实体脱钩模式(ED) 。我们的模型基于双向变压器编码器(即BERT),为输入文本中的文字和实体制作背景嵌入器。模型使用一个新的蒙面实体预测任务进行培训,目的是通过在从维基百科获得的实体附加说明文本中随机预测隐蔽实体来培训模型。我们进一步扩展了模型,将ED作为顺序决定任务加以解决,以捕捉全球背景信息。我们使用6个标准的ED数据集评估我们的模型,并在除一个数据集外的所有数据集中实现新的最新结果。