机翻如下：标题： $O(\log n)$ 轮的大规模并行单源 SimRank 摘要：SimRank 是评估图中两个节点结构相似性最基本的度量之一，并已被应用于大量的数据管理任务中。其中许多任务涉及单源 SimRank 计算，即评估源节点s和所有其他节点之间的SimRank 值。由于其高计算复杂度，对于大型图形的单源SimRank 计算非常具有挑战性，因此近期的研究都借鉴了分布式处理。令人惊讶的是，尽管SimRank已经广泛应用了20年，但分布式 SimRank 的理论方面几乎没有被研究过。在这篇论文中，我们在 Massive Parallel Computation (MPC) 模型中进行单源 SimRank 计算的理论研究，该模型是模拟MapReduce、Hadoop或Spark等分布式系统的标准理论框架。现有的分布式 SimRank 算法要么强制执行 $\Omega(\log n)$ 通信轮复杂度要么需要 $\Omega(n)$ 的机器空间来处理n个节点的图。我们克服了这一障碍。特别地，在给定的n个节点的图中，对于任何查询节点$v$和常数误差$\epsilon>\frac{3}{n}$，我们证明使用$O(\log^2 \log n)$ 轮机器之间的通信就足够计算单源 SimRank 值，最多具有$\epsilon$的绝对误差，同时每台机器所需的空间小于$n$。据我们所知，这是第一个可以克服 $\Theta(\log n)$ 轮复杂度障碍的具有证明结果的评估源为s的SimRank算法。 (Massively Parallel Single-Source SimRanks in $o(\log n)$ Rounds)

翻译：机翻如下：标题： $O(\log n)$ 轮的大规模并行单源 SimRank 摘要：SimRank 是评估图中两个节点结构相似性最基本的度量之一，并已被应用于大量的数据管理任务中。其中许多任务涉及单源 SimRank 计算，即评估源节点s和所有其他节点之间的SimRank 值。由于其高计算复杂度，对于大型图形的单源SimRank 计算非常具有挑战性，因此近期的研究都借鉴了分布式处理。令人惊讶的是，尽管SimRank已经广泛应用了20年，但分布式 SimRank 的理论方面几乎没有被研究过。在这篇论文中，我们在 Massive Parallel Computation (MPC) 模型中进行单源 SimRank 计算的理论研究，该模型是模拟MapReduce、Hadoop或Spark等分布式系统的标准理论框架。现有的分布式 SimRank 算法要么强制执行 $\Omega(\log n)$ 通信轮复杂度要么需要 $\Omega(n)$ 的机器空间来处理n个节点的图。我们克服了这一障碍。特别地，在给定的n个节点的图中，对于任何查询节点$v$和常数误差$\epsilon>\frac{3}{n}$，我们证明使用$O(\log^2 \log n)$ 轮机器之间的通信就足够计算单源 SimRank 值，最多具有$\epsilon$的绝对误差，同时每台机器所需的空间小于$n$。据我们所知，这是第一个可以克服 $\Theta(\log n)$ 轮复杂度障碍的具有证明结果的评估源为s的SimRank算法。

Siqiang Luo,Zulun Zhu

SimRank is one of the most fundamental measures that evaluate the structural similarity between two nodes in a graph and has been applied in a plethora of data management tasks. These tasks often involve single-source SimRank computation that evaluates the SimRank values between a source node $s$ and all other nodes. Due to its high computation complexity, single-source SimRank computation for large graphs is notoriously challenging, and hence recent studies resort to distributed processing. To our surprise, although SimRank has been widely adopted for two decades, theoretical aspects of distributed SimRanks with provable results have rarely been studied. In this paper, we conduct a theoretical study on single-source SimRank computation in the Massive Parallel Computation (MPC) model, which is the standard theoretical framework modeling distributed systems such as MapReduce, Hadoop, or Spark. Existing distributed SimRank algorithms enforce either $\Omega(\log n)$ communication round complexity or $\Omega(n)$ machine space for a graph of $n$ nodes. We overcome this barrier. Particularly, given a graph of $n$ nodes, for any query node $v$ and constant error $\epsilon>\frac{3}{n}$, we show that using $O(\log^2 \log n)$ rounds of communication among machines is almost enough to compute single-source SimRank values with at most $\epsilon$ absolute errors, while each machine only needs a space sub-linear to $n$. To the best of our knowledge, this is the first single-source SimRank algorithm in MPC that can overcome the $\Theta(\log n)$ round complexity barrier with provable result accuracy.

翻译：注意事项：将Proper noun 用英文标记。

相关内容

Omega

关注 17

在Omega中，资源发放是乐观的(optimistic)，每一个应用都发放了所有的可用的资源，冲突是在提交的时候被解决的。Omega的资源管理器，本质上是一个保存着每一个节点的状态关系数据库，并且用不同的乐观并发控制来解决冲突。这样的好处是其大大的提高了调度器的性能(完全的并行，full parallelism)和资源利用率。

《自动常识空间推理：仍然是一个巨大的挑战》英国利兹大学27页报告

专知会员服务

22+阅读 · 2023年2月25日

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

专知会员服务

16+阅读 · 2020年3月27日