计算机应用 ›› 2015, Vol. 35 ›› Issue (12): 3374-3377.DOI: 10.11772/j.issn.1001-9081.2015.12.3374

• 先进计算 • 上一篇    下一篇

基于多阶段划分的MapReduce模型

李振举, 李学军, 杨晟, 刘涛   

  1. 解放军装备学院信息装备系, 北京 101416
  • 收稿日期:2015-06-09 修回日期:2015-08-27 出版日期:2015-12-10 发布日期:2015-12-10
  • 通讯作者: 李振举(1987-),男,河南安阳人,助理工程师,博士研究生,CCF会员,主要研究方向:云计算、遥感数据管理
  • 作者简介:李学军(1967-),男,湖北监利人,教授,博士,CCF会员,主要研究方向:通信与信息系统、计算机图形学;杨晟(1985-),男,陕西汉中人,讲师,博士,主要研究方向:遥感影像匹配;刘涛(1979-),男,陕西渭南人,讲师,博士,主要研究方向:计算机图形学、遥感图像并行处理。
  • 基金资助:
    总装备部预研项目(513150701)。

MapReduce performance model based on multi-phase dividing

LI Zhenju, LI Xuejun, YANG Sheng, LIU Tao   

  1. Department of Information Equipment, Equipment Academy of PLA, Beijing 101416, China
  • Received:2015-06-09 Revised:2015-08-27 Online:2015-12-10 Published:2015-12-10

摘要: 针对已有的MapReduce模型阶段划分粒度不合理导致模型精度和复杂度存在的问题,提出了阶段划分粒度为5的多阶段MapReduce模型(MR-Model)。首先综述了MapReduce模型的研究现状;然后将MapReduce划分为Read、Map、Shuffle、Reduce、Write共5个阶段,并对每个阶段的具体运行时间进行研究;最后通过实验对模型的预测性能进行验证。实验结果表明,提出的MR-Model可用来描述MapReduce实际任务的执行过程,与另外两种不同划分粒度的模型P-Model和H-Model相比,MR-Model模型的运行时间预测精度可以提高10%~30%,在Reduce阶段的运行时间预测精度可以提高2~3倍,综合性能较好。

关键词: 云计算, MapReduce, 性能模型, 多阶段划分, 划分粒度

Abstract: In order to resolve the low precision and complexity problem of the existing MapReduce model caused by the reasonable phase partitioning granularity, a multi-phase MapReduce Model (MR-Model) with 5 partition granularities was proposed. Firstly, the research status of MapReduce model was reviewed. Secondly, the MapReduce job was divided into 5 phases of Read, Map, Shuffle, Reduce, Write and the specific processing time of each phase was studied. Finally, the MR-model prediction performance was tested by experiments. The experimental results show that MR-Model is suitable for the MapReduce actual job execution process. Compared with the two existing models of P-Model and H-Model, the time accuracy precision of MR-Model can be improved by 10%-30%; in the Reduce phase, its time accuracy precision can be improved by 2-3 times, the comprehensive property of the MR-Model is better.

Key words: cloud computing, MapReduce, performance model, multi-phase division, partition granularity

中图分类号: