云平台下图数据处理技术

doi:10.11772/j.issn.1001-9081.2015.01.0043

计算机应用 ›› 2015, Vol. 35 ›› Issue (1): 43-47.DOI: 10.11772/j.issn.1001-9081.2015.01.0043

云平台下图数据处理技术

刘超, 唐郑望, 姚宏, 胡成玉, 梁庆中

中国地质大学(武汉) 计算机学院, 武汉430074

收稿日期:2014-07-18 修回日期:2014-09-07 出版日期:2015-01-01 发布日期:2015-01-26
通讯作者: 姚宏
作者简介:刘超(1979-),男,湖北武汉人,讲师,硕士,CCF会员,主要研究方向:云计算、分布式系统;唐郑望(1992-),男,湖北荆州人,硕士研究生,主要研究方向:大数据处理;姚宏(1976-),男,河南许昌人,副教授,博士,主要研究方向:移动计算、物联网;胡成玉(1978-),男,湖北襄阳人,副教授,博士,主要研究方向:云计算、水管网优化;梁庆中(1979-),男,广西桂林人,讲师,硕士,主要研究方向:移动互联网与优化.
基金资助:
国家自然科学基金资助项目(61272470, 61305087);中央高校基本业务费专项资金资助项目(CUGL130233).

Graph data processing technology in cloud platform

LIU Chao, TANG Zhengwang, YAO Hong, HU Chengyu, LIANG Qingzhong

School of Computer Science, China University of Geosciences, Wuhan Hubei 430074, China

Received:2014-07-18 Revised:2014-09-07 Online:2015-01-01 Published:2015-01-26

摘要/Abstract

摘要：

针对Hadoop云平台下MapReduce计算模型在处理图数据时效率低下的问题,提出了一种类似谷歌Pregel的图数据处理计算框架——MyBSP.首先,分析了MapReduce的运行机制及不足之处;其次,阐述了MyBSP框架的结构、工作流程及主要接口;最后,在分析PageRank图处理算法原理的基础上,设计并实现了基于MyBSP框架的PageRank算法.实验结果表明,基于MyBSP框架的图数据处理算法与基于MapReduce的算法相比,迭代处理的性能提升了1.9~3倍.MyBSP算法的执行时间减少了67%,能够满足图数据高效处理的应用前景.

关键词: 图数据处理, 云计算, MapReduce计算模型, 批量同步并行模型, PageRank算法

Abstract:

MapReduce computation model can not satisfy the efficiency requirement of graph data processing in the Hadoop cloud platform. In order to address the issue, a novel computation framework of graph data processing, called MyBSP (My Bulk Synchronous Parallel), was proposed. MyBSP is similar with Pregel developed from Google. Firstly, the running mechanism and shortcomings of MapReduce were analyzed. Secondly, the structure, workflow and principal interfaces of MyBSP framework were described. Finally, the principle of the PageRank algorithm for graph data processing was analyzed. Subsequently, the design and implementation of the PageRank algorithm for graph data processing were presented. The experimental results show that, the iteration processing performance of graph data processing algorithm based on the MyBSP framework is raised by 1.9-3 times compared with the algorithm based on MapReduce. Furthermore, the execution time of the MyBSP algorithm is reduced by 67% compared with MapReduce approach. Thus, MyBSP can efficiently meet the application prospect of graph data processing.

Key words: graph data processing, cloud computing, MapReduce computation model, Bulk Synchronous Parallel (BSP) model, PageRank algorithm

中图分类号:

TP391
TP311

刘超, 唐郑望, 姚宏, 胡成玉, 梁庆中. 云平台下图数据处理技术[J]. 计算机应用, 2015, 35(1): 43-47.

LIU Chao, TANG Zhengwang, YAO Hong, HU Chengyu, LIANG Qingzhong. Graph data processing technology in cloud platform[J]. Journal of Computer Applications, 2015, 35(1): 43-47.

参考文献

[1] GANTZ J, REINSEL D. Extracting value from chaos [EB/OL]. [2014-06-15]. http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf.
[2] DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1): 107-113.
[3] The Apache Software Foundation. Apache Hadoop [EB/OL]. [2014-06-10]. http://hadoop.apache.org/.
[4] XIAO Q, WANG J, MA Y, et al. NOHAA: a novel framework for HPC analytics over Windows Azure [C]// Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems. Washington, DC: IEEE Computer Society, 2012: 448-455.
[5] WANG L, von LASZEWSKI G, YOUNGE A, et al. Cloud computing: a perspective study [J]. New Generation Computing, 2010, 28(2): 137-146.
[6] OSTERMANN S, IOSUP A, YIGITBASI N, et al. A performance analysis of EC2 cloud computing services for scientific computing [C]// CloudComp 2009: Proceedings of the First International Conference on Cloud Computing, LNCS 34. Berlin: Springer, 2010: 115-131.
[7] SALIHOGLU S, WIDOM J. GPS: a graph processing system [C]// Proceedings of the 25th International Conference on Scientific and Statistical Database Management. New York: ACM Press, 2013: 1-22.
[8] JIN W, WANG C. Iteration MapReduce framework for evolution algorithm [J]. Journal of Computer Applications, 2013, 33(12): 3591-3595.(金伟健,王春枝.适于进化算法的迭代式MapReduce框架[J].计算机应用,2013,33(12):3591-3595.)
[9] LIANG Q, WU Y, FENG L. User ranking algorithm for microblog search based on MapReduce [J]. Journal of Computer Applications, 2012, 32(11): 2989-2993.(梁秋实,吴一雷,封磊.基于MapReduce的微博用户搜索排名算法[J].计算机应用,2012,32(11):2989-2993.)
[10] YU G, GU Y, BAO Y, et al. Large scale graph data processing on cloud computing environments [J]. Chinese Journal of Computers, 2011, 34(10): 1753-1767.(于戈,谷峪,鲍玉斌,等.云计算环境下的大规模图数据处理技术[J].计算机学报,2011,34(10):1753-1767.)
[11] MALEWICZ G, AUSTERN M H, BIK A J C, et al. Pregel: a system for large-scale graph processing [C]// Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2010: 135-146.
[12] SEO S, YOON E J, KIM J, et al. HAMA: an efficient matrix computation with the MapReduce framework [C]// Proceedings of the Second International Conference on Cloud Computing Technology and Science. Washington, DC: IEEE Computer Society, 2010: 721-726.
[13] CHEBOLU P, MELSTED P. PageRank and the random surfer model [C]// Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 2008: 1010-1018.
[14] EKANAYAKE J, LI H, ZHANG B, et al. Twister: a runtime for iterative MapReduce [C]// Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. New York: ACM Press, 2010: 810-818.
[15] BU Y, HOWE B, BALAZINSKA M, et al. HaLoop: efficient iter-ative data processing on large clusters [J]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 285-296.
[16] LOW Y, BICKSON D, GONZALEZ J, et al. Distributed Gra-phLab: a framework for machine learning and data mining in the cloud [J]. Proceedings of the VLDB Endowment, 2012, 5(8): 716-727.
[17] HUNT P, KONAR M, JUNQUEIRA F P, et al. ZooKeeper: wait-free coordination for Internet-scale systems [C]// USENIXATC 2010: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference. Berkeley: USENIX Association, 2010: 11.
[18] KAMBATLA K, RAPOLU N, JAGANNATHAN S, et al. Asyn-chronous algorithms in MapReduce [C]// CLUSTER'10: Proceedings of the 2010 IEEE International Conference on Cluster Computing. Washington, DC: IEEE Computer Society, 2010: 245-254.

云平台下图数据处理技术

Graph data processing technology in cloud platform

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈家豪, 殷新春. 基于云雾计算的可追踪可撤销密文策略属性基加密方案[J]. 计算机应用, 2021, 41(6): 1611-1620.
[2]	葛丽娜, 胡雨谷, 张桂芬, 陈园园. 云计算环境基于客体属性匹配的逆向混合访问控制方案[J]. 计算机应用, 2021, 41(6): 1604-1610.
[3]	李翀, 王宇宸, 杜伟静, 何晓涛, 刘学敏, 张士波, 李树仁. 基于Web of Science的PageRank人才挖掘算法[J]. 计算机应用, 2021, 41(5): 1356-1360.
[4]	杨翎, 姜春茂. 基于三支决策的虚拟机节能迁移策略[J]. 计算机应用, 2021, 41(4): 990-998.
[5]	孙晓玲, 杨光, 沈焱萍, 杨秋格, 陈涛. 基于可拆分倒排索引的可搜索加密方案[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3288-3294.
[6]	吕佳玉, 竺智荣, 姚志强. 云计算环境下的双通道数据动态加密策略[J]. 计算机应用, 2020, 40(8): 2268-2273.
[7]	陈程军, 毛莺池, 王绎超. 基于激活-熵的分层迭代剪枝策略的CNN模型压缩[J]. 计算机应用, 2020, 40(5): 1260-1265.
[8]	郭曙杰, 李志华, 蔺凯青. 云环境下基于模糊隶属度的虚拟机放置算法[J]. 计算机应用, 2020, 40(5): 1374-1381.
[9]	许英鑫, 孙磊, 赵建成, 郭松辉. 基于蚁群优化算法的虚拟现场可编程门阵列部署策略[J]. 计算机应用, 2020, 40(3): 747-752.
[10]	王庆永, 毛莺池, 王绎超, 王龙宝. 基于多微云协作的计算任务卸载[J]. 计算机应用, 2020, 40(2): 328-334.
[11]	刘福鑫, 李劲巍, 王熠弘, 李琳. 基于Kubernetes的云原生海量数据存储系统设计与实现[J]. 计算机应用, 2020, 40(2): 547-552.
[12]	杨哂哂, 吴慧珍, 庄黎丽, 吕宏武. 基于Markov过程的IaaS系统可用性建模与分析方法[J]. 计算机应用, 2020, 40(10): 3013-3018.
[13]	林立, 熊金波, 肖如良, 林铭炜, 陈秀华. Gaming@Edge:基于边缘节点的低延迟云游戏系统[J]. 计算机应用, 2019, 39(7): 2001-2007.
[14]	徐雅斌, 彭宏恩. 基于需求预测的PaaS平台资源分配方法[J]. 计算机应用, 2019, 39(6): 1583-1588.
[15]	李启锐, 彭志平, 崔得龙, 何杰光. 容器云环境虚拟资源配置策略的优化[J]. 计算机应用, 2019, 39(3): 784-789.