Biharmonic distance (\bd) is a powerful graph distance metric with many applications, including identifying critical links in road networks and mitigating over-squashing problem in \gnn. However, computing \bd\ is extremely difficult, especially on large graphs. In this paper, we focus on the problem of \emph{single-pair} \bd\ query. Existing methods mainly rely on random walk-based approaches, which work well on some graphs but become inefficient when the random walk cannot mix rapidly.To overcome this issue, we first show that the biharmonic distance between two nodes $s,t$, denoted by $b(s,t)$, can be interpreted as the distance between two random walk distributions starting from $s$ and $t$. To estimate these distributions, the required random walk length is large when the underlying graph can be easily cut into smaller pieces. Inspired by this observation, we present novel formulas of \bd to represent $b(s,t)$ by independent random walks within two node sets $\mathcal{V}_s$, $\mathcal{V}_t$ separated by a small \emph{cut set} $\mathcal{V}_{cut}$, where $\mathcal{V}_s\cup\mathcal{V}_t\cup\mathcal{V}_{cut}=\mathcal{V}$ is the set of graph nodes. Building upon this idea, we propose \bindex, a novel index structure which follows a divide-and-conquer strategy. The graph is first cut into pieces so that each part can be processed easily. Then, all the required random walk probabilities can be deterministically computed in a bottom-top manner. When a query comes, only a small part of the index needs to be accessed. We prove that \bindex\ requires $O(n\cdot h)$ space, can be built in $O(n\cdot h\cdot (h+d_{max}))$ time, and answers each query in $O(n\cdot h)$ time, where $h$ is the height of a hierarchy partition tree and $d_{max}$ is the maximum degree, which are both usually much smaller than $n$.
翻译:双调和距离(\\bd)是一种强大的图距离度量,具有多种应用,包括识别道路网络中的关键链路以及缓解图神经网络(\\gnn)中的过度压缩问题。然而,计算\\bd极其困难,尤其是在大规模图上。本文聚焦于\\emph{单对节点}\\bd查询问题。现有方法主要依赖基于随机游走的方法,这些方法在某些图上表现良好,但当随机游走无法快速混合时效率低下。为解决此问题,我们首先证明两个节点$s,t$之间的双调和距离(记为$b(s,t)$)可解释为从$s$和$t$出发的两个随机游走分布之间的距离。当底层图易于分割为较小部分时,估计这些分布所需的随机游走长度会很大。受此观察启发,我们提出了\\bd的新公式,通过两个被小型\\emph{割集}$\\mathcal{V}_{cut}$分隔的节点集$\\mathcal{V}_s$、$\\mathcal{V}_t$内的独立随机游走表示$b(s,t)$,其中$\\mathcal{V}_s\\cup\\mathcal{V}_t\\cup\\mathcal{V}_{cut}=\\mathcal{V}$为图节点集。基于此思想,我们提出\\bindex,一种遵循分治策略的新型索引结构。首先将图分割为易于处理的子部分,然后以自底向上的方式确定性计算所有所需的随机游走概率。当查询到达时,仅需访问索引的一小部分。我们证明\\bindex需要$O(n\\cdot h)$空间,可在$O(n\\cdot h\\cdot (h+d_{max}))$时间内构建,并在$O(n\\cdot h)$时间内响应每个查询,其中$h$为层次分割树的高度,$d_{max}$为最大节点度,两者通常远小于$n$。