计算机应用 ›› 2015, Vol. 35 ›› Issue (1): 43-47.DOI: 10.11772/j.issn.1001-9081.2015.01.0043

• 先进计算 • 上一篇    下一篇

云平台下图数据处理技术

刘超, 唐郑望, 姚宏, 胡成玉, 梁庆中   

  1. 中国地质大学(武汉) 计算机学院, 武汉430074
  • 收稿日期:2014-07-18 修回日期:2014-09-07 出版日期:2015-01-01 发布日期:2015-01-26
  • 通讯作者: 姚宏
  • 作者简介:刘超(1979-),男,湖北武汉人,讲师,硕士,CCF会员,主要研究方向:云计算、分布式系统;唐郑望(1992-),男,湖北荆州人,硕士研究生,主要研究方向:大数据处理;姚宏(1976-),男,河南许昌人,副教授,博士,主要研究方向:移动计算、物联网;胡成玉(1978-),男,湖北襄阳人,副教授,博士,主要研究方向:云计算、水管网优化;梁庆中(1979-),男,广西桂林人,讲师,硕士,主要研究方向:移动互联网与优化.
  • 基金资助:

    国家自然科学基金资助项目(61272470, 61305087);中央高校基本业务费专项资金资助项目(CUGL130233).

Graph data processing technology in cloud platform

LIU Chao, TANG Zhengwang, YAO Hong, HU Chengyu, LIANG Qingzhong   

  1. School of Computer Science, China University of Geosciences, Wuhan Hubei 430074, China
  • Received:2014-07-18 Revised:2014-09-07 Online:2015-01-01 Published:2015-01-26

摘要:

针对Hadoop云平台下MapReduce计算模型在处理图数据时效率低下的问题,提出了一种类似谷歌Pregel的图数据处理计算框架——MyBSP.首先,分析了MapReduce的运行机制及不足之处;其次,阐述了MyBSP框架的结构、工作流程及主要接口;最后,在分析PageRank图处理算法原理的基础上,设计并实现了基于MyBSP框架的PageRank算法.实验结果表明,基于MyBSP框架的图数据处理算法与基于MapReduce的算法相比,迭代处理的性能提升了1.9~3倍.MyBSP算法的执行时间减少了67%,能够满足图数据高效处理的应用前景.

关键词: 图数据处理, 云计算, MapReduce计算模型, 批量同步并行模型, PageRank算法

Abstract:

MapReduce computation model can not satisfy the efficiency requirement of graph data processing in the Hadoop cloud platform. In order to address the issue, a novel computation framework of graph data processing, called MyBSP (My Bulk Synchronous Parallel), was proposed. MyBSP is similar with Pregel developed from Google. Firstly, the running mechanism and shortcomings of MapReduce were analyzed. Secondly, the structure, workflow and principal interfaces of MyBSP framework were described. Finally, the principle of the PageRank algorithm for graph data processing was analyzed. Subsequently, the design and implementation of the PageRank algorithm for graph data processing were presented. The experimental results show that, the iteration processing performance of graph data processing algorithm based on the MyBSP framework is raised by 1.9-3 times compared with the algorithm based on MapReduce. Furthermore, the execution time of the MyBSP algorithm is reduced by 67% compared with MapReduce approach. Thus, MyBSP can efficiently meet the application prospect of graph data processing.

Key words: graph data processing, cloud computing, MapReduce computation model, Bulk Synchronous Parallel (BSP) model, PageRank algorithm

中图分类号: