Information Extraction from scientific literature can be challenging due to the highly specialised nature of such text. We describe our entity recognition methods developed as part of the DEAL (Detecting Entities in the Astrophysics Literature) shared task. The aim of the task is to build a system that can identify Named Entities in a dataset composed by scholarly articles from astrophysics literature. We planned our participation such that it enables us to conduct an empirical comparison between word-based tagging and span-based classification methods. When evaluated on two hidden test sets provided by the organizer, our best-performing submission achieved $F_1$ scores of 0.8307 (validation phase) and 0.7990 (testing phase).
翻译:从科学文献中提取信息可能具有挑战性,因为这种文本具有高度专门性。我们描述了作为DEAL(检测天体物理学文献中的实体)共同任务的一部分而开发的实体识别方法。任务的目的是建立一个系统,在由天体物理学文献中的学术文章组成的数据集中识别命名的实体。我们计划我们的参与,以便使我们能够对基于字的标记和基于横跨的分类方法进行实证比较。在对组织者提供的两套隐藏的测试组进行评估时,我们提交的最佳业绩报告达到了0.8307美元分(验证阶段)和0.7990分(测试阶段)。