The sheer increase in the size of graph data has created a lot of interest into developing efficient distributed graph processing frameworks. Popular existing frameworks such as Graphlab and Pregel rely on balanced graph partitioning in order to minimize communication and achieve work balance. In this work we contribute to the recent research line of streaming graph partitioning \cite{stantonstreaming,stanton,fennel} which computes an approximately balanced $k$-partitioning of the vertex set of a graph using a single pass over the graph stream using degree-based criteria. This graph partitioning framework is well tailored to processing large-scale and dynamic graphs. In this work we introduce the use of higher length walks for streaming graph partitioning and show that their use incurs a minor computational cost which can significantly improve the quality of the graph partition. We perform an average case analysis of our algorithm using the planted partition model \cite{condon2001algorithms,mcsherry2001spectral}. We complement the recent results of Stanton \cite{stantonstreaming} by showing that our proposed method recovers the true partition with high probability even when the gap of the model tends to zero as the size of the graph grows. Furthermore, among the wide number of choices for the length of the walks we show that the proposed length is optimal. Finally, we conduct experiments which verify the value of the proposed method.
翻译:图形数据规模的扩大对开发高效分布式图表处理框架产生了很大的兴趣。 Pregel 和 Graplab 等流行的现有框架依靠平衡的图形分区来最大限度地减少通信和实现工作平衡。 在这项工作中,我们为最近的流线图形分区研究线贡献了力量 \ cite{stantonsstreaming,stanton, fennel}, 该流线利用基于度的标准,在图形流上用一个通道来计算一个大致平衡的 $- 折合的图表的顶点部分。 这个图形分区框架非常适合大型和动态图表的处理。 在这项工作中,我们引入了使用更长的流式图表分区以最大限度地减少通信和工作平衡。 在显示我们拟议方法的使用将带来少量计算成本,从而大大改善图形分区的质量。 我们用配置的分区模型 \ cite {condondon2001algorithms, mcshell2001光谱 } 我们补充了 Stast {stantonstrain slowing} 的最近结果。 通过显示我们拟议方法将恢复真实路径的走向, 当我们的拟议方法最终的走向显示, 当我们的拟议方向的走向将恢复到最优度时, 当我们的拟议方向将显示为最接近时,我们的拟议方向的路径的走向将恢复到最接近值时,