项目名称: 基于系统层次结构的大图并行处理框架研究
项目编号: No.61300014
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 张熙
作者单位: 北京邮电大学
项目金额: 25万元
中文摘要: 随着社交网络的兴起,基于大规模图结构上的计算、分析与挖掘,成为具有重要价值的研究热点。数据的持续增加和关联关系的日益复杂,对并行系统结构设计和图划分算法的设计都提出了挑战。本项目拟针对大规模图并行处理中高性能、高效率和易编程的需求,通过将大图划分算法与系统层次化结构有机结合,提出一种新颖的大图处理框架。该框架利用分布式系统中各层级的结构特性,建立划分代价分析模型,设计适应各自层级需求的高效划分方法。在此基础上,建立统一的层次化大图划分框架。为了进一步提升效率,通过分析大图数据存储与访问的模式,优化数据分布、数据复制、节点通信、I/O访问和任务调度等机制。针对自然图及动态图等多种类型图结构,提出自适应、自优化的处理机制。最后,构建开源框架和原型系统,并在社交网络及电信网络等应用领域中进行评估验证。
中文关键词: 大图处理;图查询;信息传播;社交网络分析;存储系统优化
英文摘要: As the development of social networks, processing, analyzing and mining huge real-world graph is active. Graphs are used to express the relationships among objects, and thus have great value. As the scale and dependencies of graph data are increasing, designing efficient systems and algorithms for large-scale graph processing is challenging. This study proposed a new parallel framework for large-scale graph processing, which optimizes graph partitioning algorithms towards hierarchical structures of distributed systems. This proposal analyzes the partitioning cost on each system structure level, and builds a cost model. Efficient partitioning methods are proposed which exploits the architecture features of each level. A unified large-scale graph partitioning mechanism is proposed together with performance optimization techniques, including data placement, data replication, node communication, I/O and task scheduling techniques. A self-adaptive and self-optimization mechanism is also proposed towards natural graphs and streaming graphs. Open source framework and prototypes are proposed, and evaluations are made on several application domains.
英文关键词: Large-scale Graph Computing;Graph query;information diffusion;Social Network Analysis;memory optimization