计算机应用 ›› 2015, Vol. 35 ›› Issue (9): 2476-2481.DOI: 10.11772/j.issn.1001-9081.2015.09.2476

• 先进计算 • 上一篇    下一篇

基于异常检测模型的异构环境下MapReduce性能优化

侯佳林, 王佳君, 聂洪玉   

  1. 西南交通大学 信息科学与技术学院, 成都 610031
  • 收稿日期:2015-04-30 修回日期:2015-07-08 出版日期:2015-09-10 发布日期:2015-09-17
  • 通讯作者: 侯佳林(1990-),男,河南洛阳人,硕士研究生,主要研究方向:MapReduce并行计算、藏语舆情监测,houjia_lin@foxmail.com
  • 作者简介:王佳君(1990-),男,河北藁城人,硕士研究生,主要研究方向:藏语舆情监测;聂洪玉(1989-),女,四川内江人,硕士研究生,主要研究方向:藏语舆情监测。
  • 基金资助:
    中央高校基本科研业务费专项资金专题研究项目(SWJTU11ZT08);国家语委"十二五"科研规划项目(YB125-49)。

MapReduce performance optimization based on anomaly detection model in heterogeneous cloud environment

HOU Jialin, WANG Jiajun, NIE Hongyu   

  1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu Sichuan 610031, China
  • Received:2015-04-30 Revised:2015-07-08 Online:2015-09-10 Published:2015-09-17

摘要: 针对"落伍者"的选择问题,提出利用故障诊断领域内通常使用的异常检测模型来选择"落伍者"的方法。首先,利用异常检测算法来发现集群中的"慢节点";然后改进MapReduce任务分配算法和推测执行算法,不再给"慢节点"分配任务并将"慢节点"中的任务分配至有空闲任务槽的正常节点中。在改进的推测执行算法中,因相同网段内的节点通常物理邻近,可提高数据传输速度,首次将"慢节点"中的任务分配至同网段的正常节点中,以便数据传输。实例验证结果表明,使用异常检测算法后可迅速检测出异常节点,且与Hadoop-LATE算法相比,处理相同任务量可缩短集群17%的任务处理时间,说明所提算法在集群整体性能优化中表现优异。

关键词: 异常检测, MapReduce性能优化, 推测执行, 异构环境

Abstract: To effectively select the straggler machines, an anomaly detection model generally adopted in failure analysis was proposed. Firstly, an anomaly detection algorithm was employed to detect the slow nodes in the cluster. Secondly, task assignment algorithm and speculative execution algorithm were improved to stop assigning new tasks to slow nodes and these tasks were assigned to normal nodes with idle slots. In the improved speculative execution, it was for the first time that those tasks in slow nodes were transferred into the normal nodes in the same network segment, since data transferring can be physically accelerated in one network segment. The experimental results demonstrate that the straggler machines are quickly detected after running the anomaly detection algorithm. Compared with the algorithms in Hadoop-LATE, 17% of the processing time can be saved when the same amount of the tasks are processed, which concludes that the proposed algorithm is more suitable for improving the overall performance of the clusters.

Key words: anomaly detection, MapReduce performance optimization, speculative execution, heterogeneous environment

中图分类号: