Developing new ideas and algorithms in the fields of graph processing and relational learning requires datasets to work with and WikiData is the largest open source knowledge graph involving more than fifty millions entities. It is larger than needed in many cases and even too large to be processed easily but it is still a goldmine of relevant facts and subgraphs. Using this graph is time consuming and prone to task specific tuning which can affect reproducibility of results. Providing a unified framework to extract topic-specific subgraphs solves this problem and allows researchers to evaluate algorithms on common datasets. This paper presents various topic-specific subgraphs of WikiData along with the generic Python code used to extract them. These datasets can help develop new methods of knowledge graph processing and relational learning.
翻译:在图表处理和关系学习领域开发新的想法和算法要求数据集与5 000多万实体合作,维基数据是最大的开放源知识图,在许多情况下比需要大,甚至太大,无法轻易处理,但它仍然是相关事实和子图的金矿。使用这个图表耗费时间,容易进行特定任务调整,可能影响成果的再现。提供一个统一框架,提取专题子图解决这个问题,使研究人员能够评估共同数据集的算法。本文介绍了维基数据的各种专题子图以及用于提取这些图的通用Python代码。这些数据集有助于开发知识图表处理和关联学习的新方法。