大规模局部估计命中时间 (Estimating Hitting Times Locally At Scale)

Hitting times provide a fundamental measure of distance in random processes, quantifying the expected number of steps for a random walk starting at node $u$ to reach node $v$. They have broad applications across domains such as network centrality analysis, ranking and recommendation systems, and epidemiology. In this work, we develop local algorithms for estimating hitting times between a pair of vertices $u,v$ without accessing the full graph, overcoming scalability issues of prior global methods. Our first algorithm uses the key insight that hitting time computations can be truncated at the meeting time of two independent random walks from $u$ and $v$. This leads to an efficient estimator analyzed via the Kronecker product graph and Markov Chain Chernoff bounds. We also present an algorithm extending the work of [Peng et al.; KDD 2021], that introduces a novel adaptation of the spectral cutoff technique to account for the asymmetry of hitting times. This adaptation captures the directionality of the underlying random walk and requires non-trivial modifications to ensure accuracy and efficiency. In addition to the algorithmic upper bounds, we also provide tight asymptotic lower bounds. We also reveal a connection between hitting time estimation and distribution testing, and validate our algorithms using experiments on both real and synthetic data.

翻译：命中时间为随机过程中的距离提供了基本度量，量化了从节点$u$出发的随机游走到达节点$v$的期望步数。它在网络中心性分析、排序与推荐系统、流行病学等多个领域具有广泛应用。本研究开发了用于估计一对顶点$u,v$之间命中时间的局部算法，无需访问完整图结构，从而克服了先前全局方法的可扩展性问题。我们的首个算法基于关键洞见：命中时间计算可在从$u$和$v$出发的两个独立随机游走的相遇时间处截断。通过克罗内克积图和马尔可夫链切尔诺夫界分析，我们得到了一种高效估计器。我们还提出一种扩展[Peng等人; KDD 2021]工作的算法，该算法引入谱截断技术的新颖变体以处理命中时间的不对称性。这种改进捕捉了底层随机游走的方向性，并需要非平凡的修正以确保准确性与效率。除算法上界外，我们还提供了紧致的渐近下界。我们进一步揭示了命中时间估计与分布测试之间的联系，并通过真实与合成数据实验验证了所提算法的有效性。