Random walk centrality is a fundamental metric in graph mining for quantifying node importance and influence, defined as the weighted average of hitting times to a node from all other nodes. Despite its ability to capture rich graph structural information and its wide range of applications, computing this measure for large networks remains impractical due to the computational demands of existing methods. In this paper, we present a novel formulation of random walk centrality, underpinning two scalable algorithms: one leveraging approximate Cholesky factorization and sparse inverse estimation, while the other sampling rooted spanning trees. Both algorithms operate in near-linear time and provide strong approximation guarantees. Extensive experiments on large real-world networks, including one with over 10 million nodes, demonstrate the efficiency and approximation quality of the proposed algorithms.
翻译:随机游走中心性是图挖掘中用于量化节点重要性和影响力的基本指标,其定义为从所有其他节点到达该节点的命中时间的加权平均值。尽管该度量能够捕捉丰富的图结构信息且应用广泛,但由于现有方法的计算需求,在大型网络中计算该度量仍不切实际。本文提出了一种随机游走中心性的新颖表述,并基于此构建了两种可扩展算法:一种利用近似Cholesky分解和稀疏逆估计,另一种则通过采样有根生成树实现。两种算法均以近线性时间复杂度运行,并提供强有力的近似保证。在包括超过1000万个节点的大型真实网络上的大量实验验证了所提算法的效率和近似质量。