Entity resolution has been an essential and well-studied task in data cleaning research for decades. Existing work has discussed the feasibility of utilizing pre-trained language models to perform entity resolution and achieved promising results. However, few works have discussed injecting domain knowledge to improve the performance of pre-trained language models on entity resolution tasks. In this study, we propose Knowledge Augmented Entity Resolution (KAER), a novel framework named for augmenting pre-trained language models with external knowledge for entity resolution. We discuss the results of utilizing different knowledge augmentation and prompting methods to improve entity resolution performance. Our model improves on Ditto, the existing state-of-the-art entity resolution method. In particular, 1) KAER performs more robustly and achieves better results on "dirty data", and 2) with more general knowledge injection, KAER outperforms the existing baseline models on the textual dataset and dataset from the online product domain. 3) KAER achieves competitive results on highly domain-specific datasets, such as citation datasets, requiring the injection of expert knowledge in future work.
翻译:几十年来,实体的解决方案一直是数据清理研究中一项至关重要和研究周全的任务。现有工作讨论了利用培训前语言模型执行实体解决方案并取得有希望的成果的可行性。然而,很少有工作讨论了注射领域知识,以改进实体解决方案任务培训前语言模型的绩效。在本研究中,我们提出了知识增强实体解决方案(KAER),这是一个新颖的框架,旨在增加培训前语言模型,并提供外部知识,供实体解决方案使用。我们讨论了利用不同知识增加和推动方法提高实体解决方案绩效的结果。我们改进了现有最新实体解决方案方法Ditto的模型。特别是,1 KAER在“脏数据”方面表现得更加有力,并取得更好的结果。2)通过更一般性的知识注入,KAER超越了现有关于文本数据集和在线产品领域数据集的基线模型。3) KAER在高域数据集(如引用数据集)上取得竞争性结果,这需要在今后工作中注入专家知识。