WikiKG90M in KDD Cup 2021 is a large encyclopedic knowledge graph, which could benefit various downstream applications such as question answering and recommender systems. Participants are invited to complete the knowledge graph by predicting missing triplets. Recent representation learning methods have achieved great success on standard datasets like FB15k-237. Thus, we train the advanced algorithms in different domains to learn the triplets, including OTE, QuatE, RotatE and TransE. Significantly, we modified OTE into NOTE (short for Norm-OTE) for better performance. Besides, we use both the DeepWalk and the post-smoothing technique to capture the graph structure for supplementation. In addition to the representations, we also use various statistical probabilities among the head entities, the relations and the tail entities for the final prediction. Experimental results show that the ensemble of state-of-the-art representation learning methods could draw on each others strengths. And we develop feature engineering from validation candidates for further improvements. Please note that we apply the same strategy on the test set for final inference. And these features may not be practical in the real world when considering ranking against all the entities.
翻译:KDD Cup 2021 中的 WikikikKG90M 是一个大型的百科全书知识图,可以使下游各种应用,例如问答和建议系统等,受益于各种问题的下游应用。邀请参与者通过预测缺失的三胞胎来完成知识图。最近的代表学习方法在FB15k-237. 等标准数据集中取得了巨大成功。因此,我们在不同领域培训先进的算法以学习三胞胎,包括OTE、QuatE、RotateE和TransE。有意义的是,我们把OTE改成备注(为Norm-OTE),以取得更好的性能。此外,我们使用DeepWalk和后移动技术来捕捉图结构作为补充。除了演示外,我们还使用各领导实体、关系和尾部实体之间的各种统计概率来进行最后预测。实验结果表明,状态-艺术代表学习方法的共集可以吸引其他方的强项。我们从验证中开发了特征工程,以便进一步改进。请注意,我们使用DeepWalkWalk 和后移动技术以同样的战略来测量真实的排名。在最后考虑这些特性时, 。