With the rapidly growing number of research publications, there is a vast amount of scholarly information that needs to be organized in digital libraries. To deal with this challenge, digital libraries use semantic techniques to build knowledge-base structures for organizing scientific information. Identifying relations between scientific terms can help with the construction of a representative knowledge-based structure. While advanced automated techniques have been developed for relation extraction, many of these techniques were evaluated under different scenarios, which limits their comparability. To this end, this study presents a thorough empirical evaluation of eight Bert-based classification models by exploring two factors: 1) Bert model variants, and 2) classification strategies. To simulate real-world settings, we conduct our sentence-level assessment using the abstracts of scholarly publications in three corpora, two of which are distinct corpora and the third of which is the union of the first two. Our findings show that SciBert models perform better than Bert-BASE models. The strategy of classifying a single relation each time is preferred in the corpus consisting of abundant scientific relations, while the strategy of identifying multiple relations at one time is beneficial to the corpus with sparse relations. Our results offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build a structured knowledge-based system for the ease of scholarly information organization.
翻译:由于研究出版物数量迅速增加,需要通过数字图书馆组织大量学术信息。为了应对这一挑战,数字图书馆使用语义技术来建立知识库结构来组织科学信息。确定科学术语之间的关系有助于构建具有代表性的知识型结构。虽然已经开发了先进的自动化技术来进行关系提取,但许多这些技术是在不同的情景下评估的,这限制了这些技术的可比性。为此,本研究报告通过探讨两个因素,对八个以贝尔特为基础的分类模型进行了彻底的经验性评价:1)贝尔特模型变量,和2)分类战略。模拟现实世界环境,我们利用三个公司学术出版物摘要来进行判决评估,其中两个公司是不同的公司,第三个公司是前两个公司的联合体。我们的研究结果表明,SciBert模型比BASE模型效果更好。在由丰富科学关系组成的文集中,倾向于对每一时间的单一关系进行分类的战略,同时确定多种关系的战略有利于与隐蔽关系的本体。我们的成果为结构化的学术图书馆的利益攸关方提供了建议,以便选择适当的技术。