知识表示学习领域代表论文全盘点

2018 年 2 月 14 日 AI科技评论 韩旭、曹书林

AI 科技评论按：知识表示学习（Knowledge Representation Learning，KRL）旨在将知识图谱 KG 的实体和关系映射到低维向量空间中，从而能够高效利用 KG。清华博士生韩旭和北师大本科生曹书林整理了一份 KRL 代表论文列表，涵盖了近年该方向重要工作。（建议点击右上角收藏哦！）

Survey papers:

Representation Learning: A Review and New Perspectives. Yoshua Bengio, Aaron Courville, and Pascal Vincent. IEEE 2013.

Knowledge Representation Learning: A Review. (In Chinese) Zhiyuan Liu, Maosong Sun, Yankai Lin, Ruobing Xie. 2016.

A Review of Relational Machine Learning for Knowledge Graphs. Maximilian Nickel, Kevin Murphy, Volker Tresp, Evgeniy Gabrilovich. IEEE 2016.

Knowledge Graph Embedding: A Survey of Approaches and Applications. Quan Wang, Zhendong Mao, Bin Wang, Li Guo. IEEE 2017.

Journal and Conference papers:

RESCAL: A Three-Way Model for Collective Learning on Multi-Relational Data. Nickel Maximilian, Tresp Volker, Kriegel Hans-Peter. ICML 2011.

SE: Learning Structured Embeddings of Knowledge Bases. Antoine Bordes, Jason Weston, Ronan Collobert, Yoshua Bengio. AAAI 2011.

LFM: A Latent Factor Model for Highly Multi-relational Data. Rodolphe Jenatton, Nicolas L. Roux, Antoine Bordes, Guillaume R. Obozinski. NIPS 2012.

NTN: Reasoning With Neural Tensor Networks for Knowledge Base Completion. Richard Socher, Danqi Chen, Christopher D. Manning, Andrew Ng. NIPS 2013.

TransE: Translating Embeddings for Modeling Multi-relational Data. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, Oksana Yakhnenko. NIPS 2013.

TransH: Knowledge Graph Embedding by Translating on Hyperplanes. Zhen Wang, Jianwen Zhang, Jianlin Feng, Zheng Chen. AAAI 2014.

TransR & CTransR: Learning Entity and Relation Embeddings for Knowledge Graph Completion. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, Xuan Zhu. AAAI 2015.

TransD: Knowledge Graph Embedding via Dynamic Mapping Matrix. Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, Jun Zhao. ACL 2015.

TransA: An Adaptive Approach for Knowledge Graph Embedding. Han Xiao, Minlie Huang, Hao Yu, Xiaoyan Zhu. arXiv 2015.

KG2E: Learning to Represent Knowledge Graphs with Gaussian Embedding. Shizhu He, Kang Liu, Guoliang Ji and Jun Zhao. CIKM 2015.

DistMult: Embedding Entities and Relations for Learning and Inference in Knowledge Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng. ICLR 2015.

PTransE: Modeling Relation Paths for Representation Learning of Knowledge Bases. Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, Song Liu. EMNLP 2015.

RTransE: Composing Relationships with Translations. Alberto García-Durán, Antoine Bordes, Nicolas Usunier. EMNLP 2015. paper

ManifoldE: From One Point to A Manifold: Knowledge Graph Embedding For Precise Link Prediction. Han Xiao, Minlie Huang and Xiaoyan Zhu. IJCAI 2016.

TransG: A Generative Mixture Model for Knowledge Graph Embedding. Han Xiao, Minlie Huang, Xiaoyan Zhu. ACL 2016.

ComplEx: Complex Embeddings for Simple Link Prediction. Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier and Guillaume Bouchard. ICML 2016.

HolE: Holographic Embeddings of Knowledge Graphs. Maximilian Nickel, Lorenzo Rosasco, Tomaso A. Poggio. AAAI 2016.

KR-EAR: Knowledge Representation Learning with Entities, Attributes and Relations. Yankai Lin, Zhiyuan Liu, Maosong Sun. IJCAI 2016.

TranSparse: Knowledge Graph Completion with Adaptive Sparse Transfer Matrix. Guoliang Ji, Kang Liu, Shizhu He, Jun Zhao. AAAI 2016.

TKRL: Representation Learning of Knowledge Graphs with Hierarchical Types. Ruobing Xie, Zhiyuan Liu, Maosong Sun.IJCAI 2016.

STransE: A Novel Embedding Model of Entities and Relationships in Knowledge Bases. Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson. NAACL-HLT 2016.

GAKE: Graph Aware Knowledge Embedding. Jun Feng, Minlie Huang, Yang Yang, Xiaoyan Zhu. COLING 2016.

DKRL: Representation Learning of Knowledge Graphs with Entity Descriptions. Ruobing Xie, Zhiyuan Liu, Jia Jia, Huanbo Luan, Maosong Sun. AAAI 2016.

ProPPR: Learning First-Order Logic Embeddings via Matrix Factorization. William Yang Wang, William W. Cohen. IJCI 2016.

SSP: Semantic Space Projection for Knowledge Graph Embedding with Text Descriptions. Han Xiao, Minlie Huang, Lian Meng, Xiaoyan Zhu. AAAI 2017.

ProjE: Embedding Projection for Knowledge Graph Completion. Baoxu Shi, Tim Weninger. AAAI 2017.

ANALOGY: Analogical Inference for Multi-relational Embeddings. Hanxiao Liu, Yuexin Wu, Yiming Yang. ICML 2017.

IKRL: Image-embodied Knowledge Representation Learning. Ruobing Xie, Zhiyuan Liu, Tat-Seng Chua, Huan-Bo Luan, Maosong Sun. IJCAI 2017.

IPTransE: Iterative Entity Alignment via Joint Knowledge Embeddings. Hao Zhu, Ruobing Xie, Zhiyuan Liu, Maosong Sun.IJCAI 2017.

Survey papers

/1 Representation Learning: A Review and New Perspectives. Yoshua Bengio, Aaron Courville, and Pascal Vincent. IEEE 2013.

论文地址：

https://arxiv.org/pdf/1206.5538.pdf

论文摘要：

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.

/2 Knowledge Representation Learning: A Review. (In Chinese) Zhiyuan Liu, Maosong Sun, Yankai Lin, Ruobing Xie. 2016.

论文地址：

http://crad.ict.ac.cn/CN/abstract/abstract3099.shtml

论文摘要：

人们构建的知识库通常被表示为网络形式，节点代表实体，连边代表实体间的关系.在网络表示形式下，人们需要设计专门的图算法存储和利用知识库，存在费时费力的缺点，并受到数据稀疏问题的困扰.最近，以深度学习为代表的表示学习技术受到广泛关注.表示学习旨在将研究对象的语义信息表示为稠密低维实值向量，知识表示学习则面向知识库中的实体和关系进行表示学习.该技术可以在低维空间中高效计算实体和关系的语义联系，有效解决数据稀疏问题，使知识获取、融合和推理的性能得到显著提升.介绍知识表示学习的最新进展，总结该技术面临的主要挑战和可能解决方案，并展望该技术的未来发展方向与前景.

/3 A Review of Relational Machine Learning for Knowledge Graphs. Maximilian Nickel, Kevin Murphy, Volker Tresp, Evgeniy Gabrilovich. IEEE 2016.

论文地址：

https://arxiv.org/pdf/1503.00759.pdf

论文摘要：

Relational machine learning studies methods for the statistical analysis of relational, or graph-structured, data. In this paper, we provide a review of how such statistical models can be “trained” on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph). In particular, we discuss two fundamentally different kinds of statistical relational models, both of which can scale to massive datasets. The first is based on latent feature models such as tensor factorization and multiway neural networks. The second is based on mining observable patterns in the graph. We also show how to combine these latent and observable models to get improved modeling power at decreased computational cost. Finally, we discuss how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. To this end, we also discuss Google’s Knowledge Vault project as an example of such combination.

/4 Knowledge Graph Embedding: A Survey of Approaches and Applications. Quan Wang, Zhendong Mao, Bin Wang, Li Guo. IEEE 2017.

论文地址：

http://ieeexplore.ieee.org/abstract/document/8047276/

论文摘要：

Knowledge graph (KG) embedding is to embed components of a KG including entities and relations into continuous vector spaces, so as to simplify the manipulation while preserving the inherent structure of the KG. It can benefit a variety of downstream tasks such as KG completion and relation extraction, and hence has quickly gained massive attention. In this article, we provide a systematic review of existing techniques, including not only the state-of-the-arts but also those with latest trends. Particularly, we make the review based on the type of information used in the embedding task. Techniques that conduct embedding using only facts observed in the KG are first introduced. We describe the overall framework, specific model design, typical training procedures, as well as pros and cons of such techniques. After that, we discuss techniques that further incorporate additional information besides facts. We focus specifically on the use of entity types, relation paths, textual descriptions, and logical rules. Finally, we briefly introduce how KG embedding can be applied to and benefit a wide variety of downstream tasks such as KG completion, relation extraction, question answering, and so forth.

Journal and Conference papers

/1 RESCAL: A Three-Way Model for Collective Learning on Multi-Relational Data. Nickel Maximilian, Tresp Volker, Kriegel Hans-Peter. ICML 2011.

论文地址：

http://www.icml-2011.org/papers/438_icmlpaper.pdf

代码地址：

https://github.com/thunlp/OpenKE

论文摘要：

Relational learning is becoming increasingly important in many areas of application. Here, we present a novel approach to relational learning based on the factorization of a three-way tensor. We show that unlike other tensor approaches, our method is able to perform collective learning via the latent components of the model and provide an efficient algorithm to compute the factorization. We substantiate our theoretical considerations regarding the collective learning capabilities of our model by the means of experiments on both a new dataset and a dataset commonly used in entity resolution. Furthermore, we show on common benchmark datasets that our approach achieves better or on-par results, if compared to current state-of-the-art relational learning solutions, while it is significantly faster to compute.

/2 SE: Learning Structured Embeddings of Knowledge Bases. Antoine Bordes, Jason Weston, Ronan Collobert, Yoshua Bengio. AAAI 2011.

论文地址：

http://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/download/3659/3898

论文摘要：

Many Knowledge Bases (KBs) are now readily available and encompass colossal quantities of information thanks to either a long-term funding effort (e.g. WordNet, OpenCyc) or a collaborative process (e.g. Freebase, DBpedia). However, each of them is based on a different rigid symbolic framework which makes it hard to use their data in other systems. It is unfortunate because such rich structured knowledge might lead to a huge leap forward in many other areas of AI like natural language processing (word-sense disambiguation, natural language understanding, ...), vision (scene classification, image semantic annotation, ...) or collaborative filtering. In this paper, we present a learning process based on an innovative neural network architecture designed to embed any of these symbolic representations into a more flexible continuous vector space in which the original knowledge is kept and enhanced. These learnt embeddings would allow data from any KB to be easily used in recent machine learning methods for prediction and information retrieval. We illustrate our method on WordNet and Freebase and also present a way to adapt it to knowledge extraction from raw text.

/3 LFM: A Latent Factor Model for Highly Multi-relational Data. Rodolphe Jenatton, Nicolas L. Roux, Antoine Bordes, Guillaume R. Obozinski. NIPS 2012.

论文地址:

http://papers.nips.cc/paper/4744-a-latent-factor-model-for-highly-multi-relational-data.pdf

论文摘要：

Many data such as social networks, movie preferences or knowledge bases are multi-relational, in that they describe multiple relations between entities. While there is a large body of work focused on modeling these data, modeling these multiple types of relations jointly remains challenging. Further, existing approaches tend to breakdown when the number of these types grows. In this paper, we propose a method for modeling large multi-relational datasets, with possibly thousands of relations. Our model is based on a bilinear structure, which captures various orders of interaction of the data, and also shares sparse latent factors across different relations. We illustrate the performance of our approach on standard tensor-factorization datasets where we attain, or outperform, state-of-the-art results. Finally, a NLP application demonstrates our scalability and the ability of our model to learn efficient and semantically meaningful verb representations.

/4 NTN: Reasoning With Neural Tensor Networks for Knowledge Base Completion. Richard Socher, Danqi Chen, Christopher D. Manning, Andrew Ng. NIPS 2013.

论文地址：

http://papers.nips.cc/paper/5028-reasoning-with-neural-tensor-networks-for-knowledge-base-completion.pdf

论文摘要：

Knowledge bases are an important resource for question answering and other tasks but often suffer from incompleteness and lack of ability to reason over their discrete entities and relationships. In this paper we introduce an expressive neural tensor network suitable for reasoning over relationships between two entities. Previous work represented entities as either discrete atomic units or with a single entity vector representation. We show that performance can be improved when entities are represented as an average of their constituting word vectors. This allows sharing of statistical strength between, for instance, facts involving the “Sumatran tiger” and “Bengal tiger.” Lastly, we demonstrate that all models improve when these word vectors are initialized with vectors learned from unsupervised large corpora. We assess the model by considering the problem of predicting additional true relations between entities given a subset of the knowledge base. Our model outperforms previous models and can classify unseen relationships in WordNet and FreeBase with an accuracy of 86.2% and 90.0%, respectively.

/5 TransE: Translating Embeddings for Modeling Multi-relational Data. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, Oksana Yakhnenko. NIPS 2013.

论文地址：

http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf

代码地址：

https://github.com/thunlp/OpenKE

论文摘要：

We consider the problem of embedding entities and relationships of multirelational data in low-dimensional vector spaces. Our objective is to propose a canonical model which is easy to train, contains a reduced number of parameters and can scale up to very large databases. Hence, we propose TransE, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities. Despite its simplicity, this assumption proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. Besides, it can be successfully trained on a large scale data set with 1M entities, 25k relationships and more than 17M training samples.

/6 TransH: Knowledge Graph Embedding by Translating on Hyperplanes. Zhen Wang, Jianwen Zhang, Jianlin Feng, Zheng Chen. AAAI 2014.

论文地址：

https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/viewFile/8531/8546

代码地址：

https://github.com/thunlp/OpenkE

论文摘要：

We deal with embedding a large scale knowledge graph composed of entities and relations into a continuous vector space. TransE is a promising method proposed recently, which is very efficient while achieving state-of-the-art predictive performance. We discuss some mapping properties of relations which should be considered in embedding, such as reflexive, one-to-many, many-to-one, and many-to-many. We note that TransE does not do well in dealing with these properties. Some complex models are capable of preserving these mapping properties but sacrifice efficiency in the process. To make a good trade-off between model capacity and efficiency, in this paper we propose TransH which models a relation as a hyperplane together with a translation operation on it. In this way, we can well preserve the above mapping properties of relations with almost the same model complexity of TransE. Additionally, as a practical knowledge graph is often far from completed, how to construct negative examples to reduce false negative labels in training is very important. Utilizing the one-to-many/many-to-one mapping property of a relation, we propose a simple trick to reduce the possibility of false negative labeling. We conduct extensive experiments on link prediction, triplet classification and fact extraction on benchmark datasets like WordNet and Freebase. Experiments show TransH delivers significant improvements over TransE on predictive accuracy with comparable capability to scale up.

/7 TransR & CTransR: Learning Entity and Relation Embeddings for Knowledge Graph Completion. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, Xuan Zhu. AAAI 2015.

论文地址：

http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/download/9571/9523/

论文代码：

https://github.com/thunlp/KB2E

https://github.com/thunlp/OpenKE

论文摘要：

Knowledge graph completion aims to perform link prediction between entities. In this paper, we consider the approach of knowledge graph embeddings. Recently, models such as TransE and TransH build entity and relation embeddings by regarding a relation as translation from head entity to tail entity. We note that these models simply put both entities and relations within the same semantic space. In fact, an entity may have multiple aspects and various relations may focus on different aspects of entities, which makes a common space insufficient for modeling. In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to stateof-the-art baselines including TransE and TransH. The source code of this paper can be obtained from https: //github.com/mrlyk423/relation extraction.

/8 TransD: Knowledge Graph Embedding via Dynamic Mapping Matrix. Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, Jun Zhao. ACL 2015.

论文地址：

http://anthology.aclweb.org/P/P15/P15-1067.pdf

代码地址：

https://github.com/thunlp/KB2E

https://github.com/thunlp/OpenKE

论文摘要：

Knowledge graphs are useful resources for numerous AI applications, but they are far from completeness. Previous work such as TransE, TransH and TransR/CTransR regard a relation as translation from head entity to tail entity and the CTransR achieves state-of-the-art performance. In this paper, we propose a more fine-grained model named TransD, which is an improvement of TransR/CTransR. In TransD, we use two vectors to represent a named symbol object (entity and relation). The first one represents the meaning of a(n) entity (relation), the other one is used to construct mapping matrix dynamically. Compared with TransR/CTransR, TransD not only considers the diversity of relations, but also entities. TransD has less parameters and has no matrix-vector multiplication operations, which makes it can be applied on large scale graphs. In Experiments, we evaluate our model on two typical tasks including triplets classification and link prediction. Evaluation results show that our approach outperforms stateof-the-art methods.

/9 TransA: An Adaptive Approach for Knowledge Graph Embedding. Han Xiao, Minlie Huang, Hao Yu, Xiaoyan Zhu. arXiv 2015.

论文地址：

https://arxiv.org/pdf/1509.05490.pdf

论文摘要：

Knowledge representation is a major topic in AI, and many studies attempt to represent entities and relations of knowledge base in a continuous vector space. Among these attempts, translation-based methods build entity and relation vectors by minimizing the translation loss from a head entity to a tail one. In spite of the success of these methods, translation-based methods also suffer from the oversimplified loss metric, and are not competitive enough to model various and complex entities/relations in knowledge bases. To address this issue, we propose TransA, an adaptive metric approach for embedding, utilizing the metric learning ideas to provide a more flexible embedding method. Experiments are conducted on the benchmark datasets and our proposed method makes significant and consistent improvements over the state-of-the-art baselines.

/10 KG2E: Learning to Represent Knowledge Graphs with Gaussian Embedding. Shizhu He, Kang Liu, Guoliang Ji and Jun Zhao. CIKM 2015.

论文地址：

https://www.semanticscholar.org/paper/Learning-to-Represent-Knowledge-Graphs-with-Gaussi-He-Liu/02e2059c328bd9fad4e676266435199663bed804

论文摘要：

The representation of a knowledge graph (KG) in a latent space recently has attracted more and more attention. To this end, some proposed models (e.g., TransE) embed entities and relations of a KG into a "point" vector space by optimizing a global loss function which ensures the scores of positive triplets are higher than negative ones. We notice that these models always regard all entities and relations in a same manner and ignore their (un)certainties. In fact, different entities and relations may contain different certainties, which makes identical certainty insufficient for modeling. Therefore, this paper switches to density-based embedding and propose KG2E for explicitly modeling the certainty of entities and relations, which learn the representations of KGs in the space of multi-dimensional Gaussian distributions. Each entity/relation is represented by a Gaussian distribution, where the mean denotes its position and the covariance (currently with diagonal covariance) can properly represent its certainty. In addition, compared with the symmetric measures used in point-based methods, we employ the KL-divergence for scoring triplets, which is a natural asymmetry function for effectively modeling multiple types of relations. We have conducted extensive experiments on link prediction and triplet classification with multiple benchmark datasets (WordNet and Freebase). Our experimental results demonstrate that our method can effectively model the (un)certainties of entities and relations in a KG, and it significantly outperforms state-of-the-art methods (including TransH and TransR).

/11 DistMult: Embedding Entities and Relations for Learning and Inference in Knowledge Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng. ICLR 2015.

论文地址：

https://arxiv.org/pdf/1412.6575.pdf

代码地址：

https://github.com/thunlp/OpenKE

论文摘要：

We consider learning representations of entities and relations in KBs using the neural-embedding approach. We show that most existing models, including NTN (Socher et al., 2013) and TransE (Bordes et al., 2013b), can be generalized under a unified learning framework, where entities are low-dimensional vectors learned from a neural network and relations are bilinear and/or linear mapping functions. Under this framework, we compare a variety of embedding models on the link prediction task. We show that a simple bilinear formulation achieves new state-of-the-art results for the task (achieving a top-10 accuracy of 73.2% vs. 54.7% by TransE on Freebase). Furthermore, we introduce a novel approach that utilizes the learned relation embeddings to mine logical rules such as BornInCity(a, b) ∧CityInCountry(b, c) → Nationality(a, c). We find that embeddings learned from the bilinear objective are particularly good at capturing relational semantics, and that the composition of relations is characterized by matrix multiplication. More interestingly, we demonstrate that our embedding-based rule extraction approach successfully outperforms a state-ofthe-art confidence-based rule mining approach in mining Horn rules that involve compositional reasoning.

/12 PTransE: Modeling Relation Paths for Representation Learning of Knowledge Bases. Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, Song Liu. EMNLP 2015.

论文地址：

https://arxiv.org/pdf/1506.00379.pdf

代码地址：

https://github.com/thunlp/KB2E

论文摘要：

Representation learning of knowledge bases aims to embed both entities and relations into a low-dimensional space. Most existing methods only consider direct relations in representation learning. We argue that multiple-step relation paths also contain rich inference patterns between entities, and propose a path-based representation learning model. This model considers relation paths as translations between entities for representation learning, and addresses two key challenges: (1) Since not all relation paths are reliable, we design a path-constraint resource allocation algorithm to measure the reliability of relation paths. (2) We represent relation paths via semantic composition of relation embeddings. Experimental results on real-world datasets show that, as compared with baselines, our model achieves significant and consistent improvements on knowledge base completion and relation extraction from text. The source code of this paper can be obtained from https://github.com/mrlyk423/ relation_extraction.

/13 RTransE: Composing Relationships with Translations. Alberto García-Durán, Antoine Bordes, Nicolas Usunier. EMNLP 2015.

论文地址：

http://www.aclweb.org/anthology/D15-1034.pdf

论文摘要：

Performing link prediction in Knowledge Bases (KBs) with embedding-based models, like with the model TransE (Bordes et al., 2013) which represents relationships as translations in the embedding space, have shown promising results in recent years. Most of these works are focused on modeling single relationships and hence do not take full advantage of the graph structure of KBs. In this paper, we propose an extension of TransE that learns to explicitly model composition of relationships via the addition of their corresponding translation vectors. We show empirically that this allows to improve performance for predicting single relationships as well as compositions of pairs of them.

/14 ManifoldE: From One Point to A Manifold: Knowledge Graph Embedding For Precise Link Prediction. Han Xiao, Minlie Huang and Xiaoyan Zhu. IJCAI 2016.

论文地址：

https://arxiv.org/pdf/1512.04792.pdf

论文摘要：

Knowledge graph embedding aims at offering a numerical knowledge representation paradigm by transforming the entities and relations into continuous vector space. However, existing methods could not characterize the knowledge graph in a fine degree to make a precise link prediction. There are two reasons for this issue: being an ill-posed algebraic system and adopting an overstrict geometric form. As precise link prediction is critical for knowledge graph embedding, we propose a manifold-based embedding principle (ManifoldE) which could be treated as a well-posed algebraic system that expands point-wise modeling in current models to manifold-wise modeling. Extensive experiments show that the proposed models achieve substantial improvements against the state-of-the-art baselines, particularly for the precise prediction task, and yet maintain high effi- ciency. All of the related poster, slides, datasets and codes have been published in http://www. ibookman.net/conference.html.

/15 TransG: A Generative Mixture Model for Knowledge Graph Embedding. Han Xiao, Minlie Huang, Xiaoyan Zhu. ACL 2016.

论文地址：

http://www.aclweb.org/anthology/P16-1219

代码地址：

https://github.com/BookmanHan/Embedding

论文摘要：

Recently, knowledge graph embedding, which projects symbolic entities and relations into continuous vector space, has become a new, hot topic in artificial intelligence. This paper proposes a novel generative model (TransG) to address the issue of multiple relation semantics that a relation may have multiple meanings revealed by the entity pairs associated with the corresponding triples. The new model can discover latent semantics for a relation and leverage a mixture of relationspecific component vectors to embed a fact triple. To the best of our knowledge, this is the first generative model for knowledge graph embedding, and at the first time, the issue of multiple relation semantics is formally discussed. Extensive experiments show that the proposed model achieves substantial improvements against the state-of-the-art baselines.

/16 ComplEx: Complex Embeddings for Simple Link Prediction. Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier and Guillaume Bouchard. ICML 2016.

论文地址：

http://proceedings.mlr.press/v48/trouillon16.pdf

代码地址：

https://github.com/thunlp/OpenKE

https://github.com/ttrouill/complex

论文摘要：

In statistical relational learning, the link prediction problem is key to automatically understand the structure of large knowledge bases. As in previous studies, we propose to solve this problem through latent factorization. However, here we make use of complex valued embeddings. The composition of complex embeddings can handle a large variety of binary relations, among them symmetric and antisymmetric relations. Compared to state-of-the-art models such as Neural Tensor Network and Holographic Embeddings, our approach based on complex embeddings is arguably simpler, as it only uses the Hermitian dot product, the complex counterpart of the standard dot product between real vectors. Our approach is scalable to large datasets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link prediction benchmarks.

/17 HolE: Holographic Embeddings of Knowledge Graphs. Maximilian Nickel, Lorenzo Rosasco, Tomaso A. Poggio. AAAI 2016.

论文地址：

http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/12484/11828

代码地址：

https://github.com/thunlp/KRLPapers

论文摘要：

Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs. In this work, we propose holographic embeddings (HOLE) to learn compositional vector space representations of entire knowledge graphs. The proposed method is related to holographic models of associative memory in that it employs circular correlation to create compositional representations. By using correlation as the compositional operator, HOLE can capture rich interactions but simultaneously remains efficient to compute, easy to train, and scalable to very large datasets. Experimentally, we show that holographic embeddings are able to outperform state-ofthe-art methods for link prediction on knowledge graphs and relational learning benchmark datasets.

/18 KR-EAR: Knowledge Representation Learning with Entities, Attributes and Relations. Yankai Lin, Zhiyuan Liu, Maosong Sun. IJCAI 2016.

论文地址：

http://nlp.csai.tsinghua.edu.cn/~lyk/publications/ijcai2016_krear.pdf

代码地址：

https://github.com/thunlp/KR-EAR

论文摘要：

Distributed knowledge representation (KR) encodes both entities and relations in a lowdimensional semantic space, which has signifi- cantly promoted the performance of relation extraction and knowledge reasoning. In many knowledge graphs (KG), some relations indicate attributes of entities (attributes) and others indicate relations between entities (relations). Existing KR models regard all relations equally, and usually suffer from poor accuracies when modeling one-to-many and many-to-one relations, mostly composed of attribute. In this paper, we distinguish existing KGrelations into attributes and relations, and propose a new KR model with entities, attributes and relations (KR-EAR). The experiment results show that, by special modeling of attribute, KR-EAR can significantly outperform state-of-the-art KR models in prediction of entities, attributes and relations. The source code of this paper can be obtained from https://github.com/thunlp/KR-EAR.

/19 TranSparse: Knowledge Graph Completion with Adaptive Sparse Transfer Matrix. Guoliang Ji, Kang Liu, Shizhu He, Jun Zhao. AAAI 2016.

论文地址：

http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/11982/11693

代码地址：

https://github.com/thunlp/KB2E

论文摘要:

We model knowledge graphs for their completion by encoding each entity and relation into a numerical space. All previous work including Trans(E, H, R, and D) ignore the heterogeneity (some relations link many entity pairs and others do not) and the imbalance (the number of head entities and that of tail entities in a relation could be different) of knowledge graphs. In this paper, we propose a novel approach TranSparse to deal with the two issues. In TranSparse, transfer matrices are replaced by adaptive sparse matrices, whose sparse degrees are determined by the number of entities (or entity pairs) linked by relations. In experiments, we design structured and unstructured sparse patterns for transfer matrices and analyze their advantages and disadvantages. We evaluate our approach on triplet classification and link prediction tasks. Experimental results show that TranSparse outperforms Trans(E, H, R, and D) significantly, and achieves state-of-the-art performance.

/20 TKRL: Representation Learning of Knowledge Graphs with Hierarchical Types. Ruobing Xie, Zhiyuan Liu, Maosong Sun.IJCAI 2016.

论文地址：

http://www.thunlp.org/~lzy/publications/ijcai2016_tkrl.pdf

代码地址：

https://github.com/thunlp/TKRL

论文摘要:

Representation learning of knowledge graphs aims to encode both entities and relations into a continuous low-dimensional vector space. Most existing methods only concentrate on learning representations with structured information located in triples, regardless of the rich information located in hierarchical types of entities, which could be collected in most knowledge graphs. In this paper, we propose a novel method named Type-embodied Knowledge Representation Learning (TKRL) to take advantages of hierarchical entity types. We suggest that entities should have multiple representations in different types. More specifically, we consider hierarchical types as projection matrices for entities, with two type encoders designed to model hierarchical structures. Meanwhile, type information is also utilized as relation-specific type constraints. We evaluate our models on two tasks including knowledge graph completion and triple classification, and further explore the performances on long-tail dataset. Experimental results show that our models significantly outperform all baselines on both tasks, especially with long-tail distribution. It indicates that our models are capable of capturing hierarchical type information which is significant when constructing representations of knowledge graphs. The source code of this paper can be obtained from https://github.com/thunlp/TKRL.

/21 STransE: A Novel Embedding Model of Entities and Relationships in Knowledge Bases. Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson. NAACL-HLT 2016.

论文地址：

https://arxiv.org/pdf/1606.08140.pdf

代码地址：

https://github.com/datquocnguyen/STransE

论文摘要:

Knowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform link prediction or knowledge base completion, i.e., predict whether a relationship not in the knowledge base is likely to be true. This paper combines insights from several previous link prediction models into a new embedding model STransE that represents each entity as a low-dimensional vector, and each relation by two matrices and a translation vector. STransE is a simple combination of the SE and TransE models, but it obtains better link prediction performance on two benchmark datasets than previous embedding models. Thus, STransE can serve as a new baseline for the more complex models in the link prediction task.

/22 GAKE: Graph Aware Knowledge Embedding. Jun Feng, Minlie Huang, Yang Yang, Xiaoyan Zhu. COLING 2016.

论文地址：

http://yangy.org/works/gake/gake-coling16.pdf

代码地址：

https://github.com/JuneFeng/GAKE

论文摘要:

Knowledge embedding, which projects triples in a given knowledge base to d-dimensional vectors, has attracted considerable research efforts recently. Most existing approaches treat the given knowledge base as a set of triplets, each of whose representation is then learned separately. However, as a fact, triples are connected and depend on each other. In this paper, we propose a graph aware knowledge embedding method (GAKE), which formulates knowledge base as a directed graph, and learns representations for any vertices or edges by leveraging the graph’s structural information. We introduce three types of graph context for embedding: neighbor context, path context, and edge context, each reflects properties of knowledge from different perspectives. We also design an attention mechanism to learn representative power of different vertices or edges. To validate our method, we conduct several experiments on two tasks. Experimental results suggest that our method outperforms several state-of-art knowledge embedding models.

/23 DKRL: Representation Learning of Knowledge Graphs with Entity Descriptions. Ruobing Xie, Zhiyuan Liu, Jia Jia, Huanbo Luan, Maosong Sun. AAAI 2016.

论文地址：

http://nlp.csai.tsinghua.edu.cn/~lzy/publications/aaai2016_dkrl.pdf

代码地址：

https://github.com/thunlp/DKRL

论文摘要:

Representation learning (RL) of knowledge graphs aims to project both entities and relations into a continuous lowdimensional space. Most methods concentrate on learning representations with knowledge triples indicating relations between entities. In fact, in most knowledge graphs there are usually concise descriptions for entities, which cannot be well utilized by existing methods. In this paper, we propose a novel RL method for knowledge graphs taking advantages of entity descriptions. More specifically, we explore two encoders, including continuous bag-of-words and deep convolutional neural models to encode semantics of entity descriptions. We further learn knowledge representations with both triples and descriptions. We evaluate our method on two tasks, including knowledge graph completion and entity classification. Experimental results on real-world datasets show that, our method outperforms other baselines on the two tasks, especially under the zero-shot setting, which indicates that our method is capable of building representations for novel entities according to their descriptions. The source code of this paper can be obtained from https://github.com/xrb92/DKRL.

/24 ProPPR: Learning First-Order Logic Embeddings via Matrix Factorization. William Yang Wang, William W. Cohen. IJCI 2016. paper

论文地址：

https://www.cs.ucsb.edu/~william/papers/ijcai2016.pdf

论文摘要：

Many complex reasoning tasks in Artificial Intelligence (including relation extraction, knowledge base completion, and information integration) can be formulated as inference problems using a probabilistic first-order logic. However, due to the discrete nature of logical facts and predicates, it is challenging to generalize symbolic representations and represent first-order logic formulas in probabilistic relational models. In this work, we take a rather radical approach: we aim at learning continuous low-dimensional embeddings for first-order logic from scratch. In particular, we first consider a structural gradient based structure learning approach to generate plausible inference formulas from facts; then, we build grounded proof graphs using background facts, training examples, and these inference formulas. To learn embeddings for formulas, we map the training examples into the rows of a binary matrix, and inference formulas into the columns. Using a scalable matrix factorization approach, we then learn the latent continuous representations of examples and logical formulas via a low-rank approximation method. In experiments, we demonstrate the effectiveness of reasoning with first-order logic embeddings by comparing with several state-of-the-art baselines on two datasets in the task of knowledge base completion.

/25 SSP: Semantic Space Projection for Knowledge Graph Embedding with Text Descriptions. Han Xiao, Minlie Huang, Lian Meng, Xiaoyan Zhu. AAAI 2017.

论文地址：

http://www.aaai.org/Conferences/AAAI/2017/PreliminaryPapers/14-XiaoH-14306.pdf

论文摘要：

Knowledge graph embedding represents entities and relations in knowledge graph as low-dimensional, continuous vectors, and thus enables knowledge graph compatible with machine learning models. Though there have been a variety of models for knowledge graph embedding, most methods merely concentrate on the fact triples, while supplementary textual descriptions of entities and relations have not been fully employed. To this end, this paper proposes the semantic space projection (SSP) model which jointly learns from the symbolic triples and textual descriptions. Our model builds interaction between the two information sources, and employs textual descriptions to discover semantic relevance and offer precise semantic embedding. Extensive experiments show that our method achieves substantial improvements against baselines on the tasks of knowledge graph completion and entity classification.

/26 ProjE: Embedding Projection for Knowledge Graph Completion. Baoxu Shi, Tim Weninger. AAAI 2017.

论文地址：

http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14279/13906

代码地址：

https://github.com/bxshi/ProjE

论文摘要：

With the large volume of new information created every day, determining the validity of information in a knowledge graph and filling in its missing parts are crucial tasks for many researchers and practitioners. To address this challenge, a number of knowledge graph completion methods have been developed using low-dimensional graph embeddings. Although researchers continue to improve these models using an increasingly complex feature space, we show that simple changes in the architecture of the underlying model can outperform state-of-the-art models without the need for complex feature engineering. In this work, we present a shared variable neural network model called ProjE that fills-in missing information in a knowledge graph by learning joint embeddings of the knowledge graph’s entities and edges, and through subtle, but important, changes to the standard loss function. In doing so, ProjE has a parameter size that is smaller than 11 out of 15 existing methods while performing 37% better than the current-best method on standard datasets. We also show, via a new fact checking task, that ProjE is capable of accurately determining the veracity of many declarative statements.

/27 ANALOGY: Analogical Inference for Multi-relational Embeddings. Hanxiao Liu, Yuexin Wu, Yiming Yang. ICML 2017.

论文地址：

https://arxiv.org/pdf/1705.02426.pdf

代码地址：

https://github.com/mana-ysh/knowledge-graph-embeddings

论文摘要：

Large-scale multi-relational embedding refers to the task of learning the latent representations for entities and relations in large knowledge graphs. An effective and scalable solution for this problem is crucial for the true success of knowledgebased inference in a broad range of applications. This paper proposes a novel framework for optimizing the latent representations with respect to the analogical properties of the embedded entities and relations. By formulating the learning objective in a differentiable fashion, our model enjoys both theoretical power and computational scalability, and significantly outperformed a large number of representative baseline methods on benchmark datasets. Furthermore, the model offers an elegant unification of several well-known methods in multi-relational embedding, which can be proven to be special instantiations of our framework.

/28 IKRL: Image-embodied Knowledge Representation Learning. Ruobing Xie, Zhiyuan Liu, Tat-Seng Chua, Huan-Bo Luan, Maosong Sun. IJCAI 2017.

论文地址：

https://www.ijcai.org/proceedings/2017/0438.pdf

代码地址：

https://github.com/xrb92/IKRL

论文摘要：

Entity images could provide significant visual information for knowledge representation learning. Most conventional methods learn knowledge representations merely from structured triples, ignoring rich visual information extracted from entity images. In this paper, we propose a novel Imageembodied Knowledge Representation Learning model (IKRL), where knowledge representations are learned with both triple facts and images. More specifically, we first construct representations for all images of an entity with a neural image encoder. These image representations are then integrated into an aggregated image-based representation via an attention-based method. We evaluate our IKRL models on knowledge graph completion and triple classification. Experimental results demonstrate that our models outperform all baselines on both tasks, which indicates the significance of visual information for knowledge representations and the capability of our models in learning knowledge representations with images.

/29 IPTransE: Iterative Entity Alignment via Joint Knowledge Embeddings. Hao Zhu, Ruobing Xie, Zhiyuan Liu, Maosong Sun.IJCAI 2017.

论文地址：

https://www.ijcai.org/proceedings/2017/0595.pdf

代码地址：

https://github.com/thunlp/IEAJKE

论文摘要：

Entity alignment aims to link entities and their counterparts among multiple knowledge graphs (KGs). Most existing methods typically rely on external information of entities such as Wikipedia links and require costly manual feature construction to complete alignment. In this paper, we present a novel approach for entity alignment via joint knowledge embeddings. Our method jointly encodes both entities and relations of various KGs into a unified low-dimensional semantic space according to a small seed set of aligned entities. During this process, we can align entities according to their semantic distance in this joint semantic space. More specifically, we present an iterative and parameter sharing method to improve alignment performance. Experiment results on realworld datasets show that, as compared to baselines, our method achieves significant improvements on entity alignment, and can further improve knowledge graph completion performance on various KGs with the favor of joint knowledge embeddings.