远程编码 -- -- 设计更强大的结构代表学习全球网 (Distance Encoding -- Design Provably More Powerful GNNs for Structural Representation Learning)

Learning structural representations of node sets from graph-structured data is crucial for applications ranging from node-role discovery to link prediction and molecule classification. Graph Neural Networks (GNNs) have achieved great success in structural representation learning. However, most GNNs are limited by the 1-Weisfeiler-Lehman (WL) test and thus possible to generate identical representation for structures and graphs that are actually different. More powerful GNNs, proposed recently by mimicking higher-order-WL tests, only focus on entire-graph representations and cannot utilize sparsity of the graph structure to be computationally efficient. Here we propose a general class of structure-related features, termed Distance Encoding (DE), to assist GNNs in representing node sets with arbitrary sizes with strictly more expressive power than the 1-WL test. DE essentially captures the distance between the node set whose representation is to be learnt and each node in the graph, which includes important graph-related measures such as shortest-path-distance and generalized PageRank scores. We propose two general frameworks for GNNs to use DEs (1) as extra node attributes and (2) further as controllers of message aggregation in GNNs. Both frameworks may still utilize the sparse structure to keep scalability to process large graphs. In theory, we prove that these two frameworks can distinguish node sets embedded in almost all regular graphs where traditional GNNs always fail. We also rigorously analyze their limitations. Empirically, we evaluate these two frameworks on node structural roles prediction, link prediction and triangle prediction over six real networks. The results show that our models outperform GNNs without DEs by up-to 15% improvement in average accuracy and AUC. Our models also significantly outperform other SOTA baselines particularly designed for those tasks.

翻译：图表结构化数据中节点的学习结构表示对于从节点发现到连接预测和分子分类等应用程序至关重要。图形神经网络(GNNS)在结构代表性学习中取得了巨大成功。然而, 大多数GNNS受到1- Weisfeiler-Lehman (WL) 测试的限制, 因而有可能对实际不同的结构和图形产生相同的表示。最近通过模拟更高命令-WL测试而提出的更强大的GNNS, 仅侧重于整个图像显示, 无法利用图形结构结构结构的广度来实现计算效率。在这里, 我们提议了一个与结构相关的总链接功能类别, 名为“ 远程 Encoding (DE), 协助GNNNNCs 代表任意大小的节点, 与1- WLT 测试相比, 更具有任意性。 DE 基本上可以捕捉到要学到的节点组合之间的距离。与图形相关的措施, 如: 最短路径和通用的平面评分等。我们提议两个通用的框架, GNNNFeral 将两个总框架用来使用直径直径直径直径, 。