Little is known about the trustworthiness of predictions made by knowledge graph embedding (KGE) models. In this paper we take initial steps toward this direction by investigating the calibration of KGE models, or the extent to which they output confidence scores that reflect the expected correctness of predicted knowledge graph triples. We first conduct an evaluation under the standard closed-world assumption (CWA), in which predicted triples not already in the knowledge graph are considered false, and show that existing calibration techniques are effective for KGE under this (common but narrow) assumption. Next, we introduce the more realistic but challenging open-world assumption (OWA), in which unobserved predictions are not considered true or false until ground-truth labels are obtained. Here, we show that existing calibration techniques are much less effective under the OWA than the CWA, and provide explanations for this discrepancy. Finally, to motivate the utility of calibration for KGE from a practitioner's perspective, we conduct a unique case study of human-AI collaboration, showing that calibrated predictions can improve human performance in a knowledge graph completion task.
翻译:对知识图嵌入(KGE)模型所作的预测的可信度知之甚少。 在本文中,我们首先通过调查KGE模型的校准,或这些模型输出反映预测知识图三重的预期正确性的信任分数的程度,来朝这个方向迈出初步步骤。 我们首先根据标准封闭世界假设(CWA)进行评估,其中没有在知识图中预测到的三重值被视为假的,并表明现有校准技术对KGE在这种(共同但狭窄的)假设下是有效的。 其次,我们引入了更现实但具有挑战性的开放世界假设(OWA),在这种假设中,在获得地面真相标签之前,未观察到的预测并不被视为真实或虚假。 我们在这里表明,在OWA下现有的校准技术远不如CWA那么有效,并对这种差异作出解释。 最后,为了鼓励校准从从业人员的角度对KGE组的效用,我们进行了人类-AI合作的独特案例研究,表明校准的预测能够改善知识图完成任务中的人类业绩。