Given a few seed entities of a certain type (e.g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds. Entity set expansion in software-related domains such as StackOverflow can benefit many downstream tasks (e.g., software knowledge graph construction) and facilitate better IT operations and service management. Meanwhile, existing approaches are less concerned with two problems: (1) How to deal with multiple types of seed entities simultaneously? (2) How to leverage the power of pre-trained language models (PLMs)? Being aware of these two problems, in this paper, we study the entity set co-expansion task in StackOverflow, which extracts Library, OS, Application, and Language entities from StackOverflow question-answer threads. During the co-expansion process, we use PLMs to derive embeddings of candidate entities for calculating similarities between entities. Experimental results show that our proposed SECoExpan framework outperforms previous approaches significantly.
翻译:鉴于有少数某种类型的种子实体(如软件或编程语言),实体设定的扩大旨在发现与种子具有相同类型的大量实体。实体在StaackOverplow等软件相关领域设定的扩展可有益于许多下游任务(如软件知识图的构建),并促进更好的信息技术业务和服务管理。与此同时,现有办法较少涉及两个问题:(1) 如何同时处理多种类型的种子实体?(2) 如何利用预先培训的语言模型(PLMs)的力量?在本文件中认识到这两个问题之后,我们研究了在StackOverplow中设定的共同扩展任务,该实体从StackOverpolt问答线索中提取图书馆、OS、应用程序和语言实体。在共同扩展过程中,我们使用PLMS来获取候选实体的嵌入,以计算各实体之间的相似之处。实验结果表明,我们提议的SECoExtaan框架大大超越了以前的做法。