常态、奇奇怪怪、知识图中缺失的东西:通过感性概括统一特征 (What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization)

Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG. In this work, we introduce a unified solution to KG characterization by formulating the problem as unsupervised KG summarization with a set of inductive, soft rules, which describe what is normal in a KG, and thus can be used to identify what is abnormal, whether it be strange or missing. Unlike first-order logic rules, our rules are labeled, rooted graphs, i.e., patterns that describe the expected neighborhood around a (seen or unseen) node, based on its type, and information in the KG. Stepping away from the traditional support/confidence-based rule mining techniques, we propose KGist, Knowledge Graph Inductive SummarizaTion, which learns a summary of inductive rules that best compress the KG according to the Minimum Description Length principle---a formulation that we are the first to use in the context of KG rule mining. We apply our rules to three large KGs (NELL, DBpedia, and Yago), and tasks such as compression, various types of error detection, and identification of incomplete information. We show that KGist outperforms task-specific, supervised and unsupervised baselines in error detection and incompleteness identification, (identifying the location of up to 93% of missing entities---over 10% more than baselines), while also being efficient for large knowledge graphs.

翻译：知识图形( KGs) 在图形结构中存储关于世界的高度多样化的信息, 并且对于诸如答题和推理等任务有用。但是, 它们往往包含错误, 并且缺少信息。 KG 精细的动态研究已经致力于解决这些问题, 定制技术来检测特定类型的错误或完成 KG 。在这项工作中, 我们为 KG 定性引入一个统一解决方案, 将问题描述为不受监督的 KG 合成, 包含一套感知性软规则, 描述 KG 中什么是正常的, 从而可以用来识别哪些是异常的, 不管是奇怪的还是缺失的。与一级逻辑规则不同, 我们的规则被贴上标签, 根的图形, 也就是说, 描述一个( 视或可见的) 节点周围的预期环境, 以及 KG. 的信息。我们建议 KGG 快速的缩略图和 KG 的大小规则的解析图中, 显示我们最精确的缩略性。