计算机应用 ›› 2014, Vol. 34 ›› Issue (11): 3227-3230.DOI: 10.11772/j.issn.1001-9081.2014.11.3227

• 先进计算 • 上一篇    下一篇

基于Hadoop的三队列作业调度算法

朱洁,赵红,李雯睿   

  1. 南京晓庄学院 数学与信息技术学院,南京 211171
  • 收稿日期:2014-05-16 修回日期:2014-06-27 出版日期:2014-11-01 发布日期:2014-12-01
  • 通讯作者: 朱洁
  • 作者简介:朱洁(1979-),女,江苏泰州人,讲师,硕士,主要研究方向:云计算、分布式计算;赵红(1982-),女,黑龙江哈尔滨人,讲师,博士,主要研究方向:人工智能、分布式计算;李雯睿(1981-),女,河南开封人,讲师,博士,主要研究方向:云计算、服务计算。
  • 基金资助:

    江苏高校优势学科建设工程项目;江苏省高校自然科学研究项目;南京晓庄学院科研项目

Three-queue job scheduling algorithm based on Hadoop

ZHU Jie,ZHAO Hong,LI Wenrui   

  1. School of Mathematics and Information Technology, Nanjing Xiaozhuang University, Nanjing Jiangsu 211171, China
  • Received:2014-05-16 Revised:2014-06-27 Online:2014-11-01 Published:2014-12-01
  • Contact: ZHU Jie

摘要:

Hadoop集群单队列作业调度会产生短作业等待、资源利用率低的问题;采用多队列调度可兼顾公平、提高执行效率,但会带来手工配置参数、资源互占、算法复杂等问题。针对上述问题,提出三队列作业调度算法,利用区分作业类型、动态调整作业优先级、配置共享资源池、作业抢占等设计,达到平衡作业需求、简化一般作业调度流程、提升并行执行能力的目的。对短作业占比高,各作业占比均衡以及一般作业为主,偶尔出现长、短作业三种情况与先进先出(FIFO)算法进行了对比实验,结果三队列算法的运行时间均比FIFO算法要少。实验结果表明,在短作业聚集时,三队列算法的执行效率提升并不显著;但当各种作业并存且分布均衡时,效果很明显,这符合了算法设计时短作业优先、一般作业简化流程、兼顾长作业的初衷,提高了作业整体执行效率。

Abstract:

Single queue job scheduling algorithm in homogeneous Hadoop cluster causes short jobs waiting and low utilization rate of resources; multi-queue scheduling algorithms solve problems of unfairness and low execution efficiency, but most of them need setting parameters manually, occupy resources each other and are more complex. In order to resolve these problems, a kind of three-queue scheduling algorithm was proposed. The algorithm used job classifications, dynamic priority adjustment, shared resource pool and job preemption to realize fairness, simplify the scheduling flow of normal jobs and improve concurrency. Comparison experiments with First In First Out (FIFO) algorithm were given under three kinds of situations, including that the percentage of short jobs is high, the percentages of all types of jobs are similar, and the general jobs are major with occasional long and short jobs. The proposed algorithm reduced the running time of jobs. The experimental results show that the execution efficiency increase of the proposed algorithm is not obvious when the major jobs are short ones; however, when the assignments of all types of jobs are balanced, the performance is remarkable. This is consistent with the algorithm design rules: prioritizing the short jobs, simplifying the scheduling flow of normal jobs and considering the long jobs, which improves the scheduling performance.

中图分类号: