DELICATE：基于类别与时间证据的历时实体链接方法 (DELICATE: Diachronic Entity LInking using Classes And Temporal Evidence)

In spite of the remarkable advancements in the field of Natural Language Processing, the task of Entity Linking (EL) remains challenging in the field of humanities due to complex document typologies, lack of domain-specific datasets and models, and long-tail entities, i.e., entities under-represented in Knowledge Bases (KBs). The goal of this paper is to address these issues with two main contributions. The first contribution is DELICATE, a novel neuro-symbolic method for EL on historical Italian which combines a BERT-based encoder with contextual information from Wikidata to select appropriate KB entities using temporal plausibility and entity type consistency. The second contribution is ENEIDE, a multi-domain EL corpus in historical Italian semi-automatically extracted from two annotated editions spanning from the 19th to the 20th century and including literary and political texts. Results show how DELICATE outperforms other EL models in historical Italian even if compared with larger architectures with billions of parameters. Moreover, further analyses reveal how DELICATE confidence scores and features sensitivity provide results which are more explainable and interpretable than purely neural methods.

翻译：尽管自然语言处理领域取得了显著进展，但在人文学科中，由于文档类型复杂、缺乏领域特定数据集与模型，以及长尾实体（即在知识库中代表性不足的实体）的存在，实体链接任务仍具挑战性。本文旨在通过两项主要贡献解决这些问题。第一项贡献是DELICATE，一种针对历史意大利语的新型神经符号实体链接方法，它结合了基于BERT的编码器与来自Wikidata的上下文信息，利用时间合理性和实体类型一致性筛选合适的知识库实体。第二项贡献是ENEIDE，一个从19世纪至20世纪涵盖文学与政治文本的两个注释版本中半自动提取的历史意大利语多领域实体链接语料库。实验结果表明，即使与具有数十亿参数的更大规模架构相比，DELICATE在历史意大利语上的性能仍优于其他实体链接模型。此外，进一步分析显示，DELICATE的置信度分数与特征敏感性提供了比纯神经方法更具可解释性和可理解性的结果。

相关内容

实体

关注 0

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

【CVPR2025】重新思考长时视频理解中的时序检索

专知会员服务

13+阅读 · 4月6日

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日