计算机应用 ›› 2016, Vol. 36 ›› Issue (4): 918-922.DOI: 10.11772/j.issn.1001-9081.2016.04.0918

• 先进计算 • 上一篇    下一篇

基于节点集计算能力差异的Hadoop自适应任务调度算法

朱洁1,2, 李雯睿1,2, 王江平1,2, 赵红1,2   

  1. 1. 南京晓庄学院 信息工程学院, 南京 211171;
    2. 南京可信云计算与大数据分析重点实验室(南京晓庄学院), 南京 211171
  • 收稿日期:2015-08-30 修回日期:2015-11-07 出版日期:2016-04-10 发布日期:2016-04-08
  • 通讯作者: 朱洁
  • 作者简介:朱洁(1979-),女,江苏泰州人,讲师,硕士,主要研究方向:云计算、分布式计算; 李雯睿(1981-),女,河南开封人,副教授,博士,主要研究方向:云计算、服务计算; 王江平(1965-),女,江苏南通人,教授,博士,主要研究方向:云计算、分布式计算; 赵红(1982-),女,黑龙江哈尔滨人,讲师,博士,主要研究方向:人工智能、分布式计算。
  • 基金资助:
    国家自然科学基金资助项目(61202136);江苏省科技项目(BY2013095-3-11);江苏省高校自然科学研究项目(13KJD520007);南京晓庄学院科研项目(2012NXY14, 2013NXY99)。

Hadoop adaptive task scheduling algorithm based on computation capacity difference between node sets

ZHU Jie1,2, LI Wenrui1,2, WANG Jiangping1,2, ZHAO Hong1,2   

  1. 1. School of Information Engineering, Nanjing Xiaozhuang University, Nanjing Jiangsu 211171, China;
    2. Nanjing Key Laboratory of Trusted Cloud Computing and Big Data Analysis(Nanjing Xiaozhuang University), Nanjing Jiangsu 211171, China
  • Received:2015-08-30 Revised:2015-11-07 Online:2016-04-10 Published:2016-04-08
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61202136), the Science and Technology Project of Jiangsu Province (BY2013095-3-11), the Natural Science Research Project of Jiangsu Province (13KJD520007), the Science Research Project of Nanjing Xiaozhuang Universities (2012NXY14, 2013NXY99).

摘要: 针对异构集群任务推测式执行算法存在的任务进度比例固定、落后任务被动选取等问题,提出基于快慢节点集计算能力差异的自适应任务调度算法。该算法量化节点集计算能力差异实现分集调度,并通过节点与任务速率的动态反馈及时更新快慢节点集,提高节点集资源利用率与任务并行度。在两节点集中,利用动态调整任务进度比例判别落后任务,主动选择采用替代执行方式为落后任务执行备份任务的快节点,从而提升任务执行效率。与最长近似结束时间(LATE)算法的实验对比结果表明,该算法在短作业集、混合型作业集、出现节点性能下降的混合型作业集执行时间上比LATE算法分别缩短了5.21%、20.51%、23.86%,启用的备份任务数比LATE算法明显减少。所提算法可使任务主动适应节点差异,在减少备份任务的同时有效提高作业整体执行效率。

关键词: Hadoop, 计算能力, 自适应, 任务调度, 推测式执行

Abstract: Aiming at the problems of the fixed task progress proportions and passive selection of slow tasks in the task speculation execution algorithm for heterogeneous cluster, an adaptive task scheduling algorithm based on the computation capacity difference between node sets was proposed. The computation capacity difference between node sets was quantified to schedule tasks by fast and slow node sets, and dynamic feedback of nodes and tasks speed were calculated to update slow node sets timely to improve the resource utilization rate and task parallelism. Within two node sets, task progress proportions were adjusted dynamically to improve the accuracy of slow tasks identification, and the fast node which executed backup tasks dynamically for slow tasks by substitute execution implementation was selected to improve the task execution efficiency. The experimental results showed that, compared with the Longest Approximate Time to End (LATE) algorithm, the proposed algorithm reduced the running time by 5.21%, 20.51% and 23.86% respectively in short job set, mixed-type job set and mixed-type job set with node performance degradation, and reduced the number of initiated backup tasks significantly. The proposed algorithm can make the task adapt to the node difference, and improves the overall job execution efficiency effectively with reducing slow backup tasks.

Key words: Hadoop, computation capacity, adaptive, task scheduling, speculation execution

中图分类号: