This paper studies learning on text-attributed graphs (TAGs), where each node is associated with a text description. An ideal solution for such a problem would be integrating both the text and graph structure information with large language models and graph neural networks (GNNs). However, the problem becomes very challenging when graphs are large due to the high computational complexity brought by large language models and training GNNs on big graphs. In this paper, we propose an efficient and effective solution to learning on large text-attributed graphs by fusing graph structure and language learning with a variational Expectation-Maximization (EM) framework, called GLEM. Instead of simultaneously training large language models and GNNs on big graphs, GLEM proposes to alternatively update the two modules in the E-step and M-step. Such a procedure allows to separately train the two modules but at the same time allows the two modules to interact and mutually enhance each other. Extensive experiments on multiple data sets demonstrate the efficiency and effectiveness of the proposed approach.
翻译:本文研究如何学习每个节点都与文本描述相联系的文本配制图表(TAGs),这一问题的理想解决办法是将文本和图形结构信息与大型语言模型和图形神经网络(GNNS)相结合。然而,如果由于大型语言模型和大图上培训GNS带来的计算复杂性很高,而图形是巨大的,那么问题就变得非常棘手。在本文件中,我们提出了一个高效而有效的解决办法,通过使用图表结构和语言学习与变异期待-最大化(EM)框架(称为GLEM)相结合,学习大型文本配制图表的图形配制图形。GLEM建议,在大图上同时培训大型语言模型和GNNS,而不是同时培训大图,而是在电子步骤和M步骤上更新两个模块。这样的程序可以分别培训两个模块,但同时允许两个模块相互作用和相互加强。关于多个数据集的广泛实验显示了拟议方法的效率和效果。