Hadoop下资源匹配最大集作业调度算法

doi:10.11772/j.issn.1001-9081.2015.12.3383

计算机应用 ›› 2015, Vol. 35 ›› Issue (12): 3383-3386.DOI: 10.11772/j.issn.1001-9081.2015.12.3383

Hadoop下资源匹配最大集作业调度算法

朱洁^1,2, 李雯睿^1,2, 赵红^1,2, 李滢^1,2

1. 南京晓庄学院信息工程学院, 南京 211171;
2. 可信云计算与大数据分析重点实验室, 南京 211171

收稿日期:2015-06-10 修回日期:2015-07-20 发布日期:2015-12-10 出版日期:2015-12-10
通讯作者: 朱洁(1979-),女,江苏泰州人,讲师,硕士,主要研究方向:云计算、分布式计算
作者简介:李雯睿(1981-),女,河南开封人,副教授,博士,主要研究方向:云计算、服务计算;赵红(1982-),女,黑龙江哈尔滨人,讲师,博士,主要研究方向:人工智能、分布式计算;李滢(1975-),女,黑龙江哈尔滨人,讲师,博士,主要研究方向:服务计算。
基金资助:
国家自然科学基金资助项目(61202136);江苏省科技项目(BY2013095-3-11);江苏省高校自然科学研究项目(13KJD520007);南京晓庄学院科研项目(2012NXY14,2013NXY99)。

Resource matching maximum set job scheduling algorithm under Hadoop

ZHU Jie^1,2, LI Wenrui^1,2, ZHAO Hong^1,2, LI Ying^1,2

1. School of Information Engineering, Nanjing Xiaozhuang University, Nanjing Jiangsu 211171, China;
2. Key Laboratory of Trusted Cloud Computing and Big Data Analysis, Nanjing Jiangsu 211171, China

Received:2015-06-10 Revised:2015-07-20 Online:2015-12-10 Published:2015-12-10

摘要/Abstract

摘要： 针对目前层级队列作业调度算法中资源占比高的作业执行效率低的问题,提出一种资源匹配最大集算法。该算法分析作业特征,引入完成度、等待时间、优先级、重调度次数为紧迫值因子,优先考虑资源占比高或等待时间长的作业,以改善作业公平性;采用双队列结构在可用资源总量内优先选择高紧迫值作业,在不同资源占比作业集比较中选择作业数最大集,以实现调度平衡。在与最大最小公平(Max-min fairness)算法的实例对比中发现,该算法可降低作业集平均等待时间、提高资源利用率。实验对比结果表明,该算法可将不同资源占比的单一类型作业集执行时间缩短18.73%,其中资源占比高的作业执行时间缩短27.26%;在混合型作业集中对应的执行时间可分别缩短22.36%与30.28%。所提算法能有效减少资源占比高作业的等待,提高作业整体执行效率。

关键词: Hadoop, 层级队列, 作业调度, 最大集, 最大最小公平算法

Abstract: Concerning the problem that jobs of high proportion of resources execute inefficiently in job scheduling algorithms of the present hierarchical queues structure, the resource matching maximum set algorithm was proposed. The proposed algorithm analysed job characteristics, introduced the percentage of completion, waiting time, priority and rescheduling times as urgent value factors. Jobs with high proportion of resources or long waiting time were preferentially considered to improve jobs fairness. Under the condition of limited amount of available resources, the double queues was applied to preferentially select jobs with high urgent values, select the maximum job set from job sets with different proportion of resources in order to achieve scheduling balance. Compared with the Max-min fairness algorithm, it is shown that the proposed algorithm can decrease average waiting time and improve resource utilization. The experimental results show that by using the proposed algorithm, the running time of the same type job set which consisted of jobs of different proportion of resources is reduced by 18.73%, and the running time of jobs of high proportion of resources is reduced by 27.26%; the corresponding percentages of reduction of the running time of the mixed-type job set are 22.36% and 30.28%. The results indicate that the proposed algorithm can effectively reduce the waiting time of jobs of high proportion of resources and improve the overall jobs execution efficiency.

Key words: Hadoop, hierarchical queue, job scheduling, maximum set, Max-min fairness

中图分类号:

朱洁, 李雯睿, 赵红, 李滢. Hadoop下资源匹配最大集作业调度算法[J]. 计算机应用, 2015, 35(12): 3383-3386.

ZHU Jie, LI Wenrui, ZHAO Hong, LI Ying. Resource matching maximum set job scheduling algorithm under Hadoop[J]. Journal of Computer Applications, 2015, 35(12): 3383-3386.

参考文献

[1] Wikipedia. Apache Hadoop[EB/OL].[2014-07-08]. http://en.wikipedia.org/wiki/Apache_Hadoop.
[2] ZAHARIA M. Job scheduling with the fair and capacity schedulers[EB/OL].[2014-07-10]. http://www.cs.berkeley.edu/~matei/talks/2009/hadoop_summit_fair_scheduler.pdf.
[3] The Apache Software Foundation. Capacity scheduler guide[EB/OL].[2014-04-08]. http://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html.
[4] ZAHARIA M, BORTHAKUR D, SARMA J S, et al. Job scheduling for multi-user MapReduce clusters, UCB/EECS-2009-55[R]. Berkeley:University of California, 2009:1-16.
[5] The Apache Software Foundation. Fair scheduler guide[EB/OL].[2014-04-08]. http://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html.
[6] ZAHARIA M, KONWINSKI A, JOSEPH A D, et al. Improving MapReduce performance in heterogeneous environment[C]//Proceeding of the 8th USENIX Conference on Operating Systems Design and Implementation. Berkeley:USENIX Association, 2008:29-42.
[7] POLO J, CARRERA D, BECERRA Y, et al. Performance-driven task co-scheduling for MapReduce environments[C]//Proceeding of the 12th IEEE/IFIP Network Operations and Management Symposium. Piscataway:IEEE, 2010:373-380.
[8] FISCHER M J, SU X, YIN Y. Assigning tasks for efficiency in Hadoop:extended abstract[C]//Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures. New York:ACM, 2010:30-39.
[9] GE Y, WEI G. A-based task scheduler for the cloud computing systemsG[C]//Proceedings of 2010 International Conference on Web Information Systems and Mining. Berlin:Springer, 2010:181-186.
[10] FAN J, PENG J, LI H. Demand elasticity algorithm for cloud computing based on ant colony optimization algorithm[J]. Journal of Computer Applications, 2011, 31(1):1-7. (范杰,彭舰,黎红友.基于蚁群算法的云计算需求弹性算法[J].计算机应用,2011,31(1):1-7.)
[11] YANG J, WU L, WU D, et al. Artificial immune algorithm for dynamic task scheduling on cloud computing platform[J]. Journal of Computer Applications, 2014, 34(2):351-356. (杨镜,吴磊,武德安,等.云平台下动态任务调度人工免疫算法[J].计算机应用,2014,34(2):351-356.)
[12] The Apache Software Foundation. Change the default scheduler to the CapacityScheduler[EB/OL].[2014-07-11]. https://issues.apache.org/jira/browse/YARN-137.
[13] GHODSI A, ZAHARIA M, HINDMAN B, et al. Dominant resource fairness:fair allocation of multiple resource types[C]//Proceeding of the 8th USENIX Conference on Networked Systems Design and Implementation. Berkeley:USENIX Association, 2011:10-24.
[14] PARKES D C, PROCACCIA A D, SHAH N. Beyond dominant resource fairness:extensions, limitations, and indivisibilities[C]//Proceedings of the 13th ACM Conference on Electronic Commerce. New York:ACM, 2012:808-825.
[15] Wikipedia. Max-min fairness[EB/OL].[2014-07-10]. http://en.wikipedia.org/wiki/Max-min_fairness.
[16] HUO J, SHI J, SUN G, et al. The optimization of BESⅢ cluster resource management by using the improved DRF algorithm[J]. Nuclear Electronics & Detection Technology, 2014, 34(10):1153-1158. (霍菁,石京燕,孙功星,等.一种改进的DRF算法对BESⅢ集群资源管理的优化[J].核电子学与探测技术,2014,34(10):1153-1158.)
[17] LU D, MA J, WANG Y, et al. Enhanced fairness-based multi-resource allocation algorithm for cloud computing[J]. Journal of Xidian University, 2014, 41(3):175-181. (卢笛,马建峰,王一川,等.一种增强公平性的云计算多资源分配算法[J].西安电子科技大学学报:自然科学版,2014,41(3):175-181.)

Hadoop下资源匹配最大集作业调度算法

Resource matching maximum set job scheduling algorithm under Hadoop

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	董聪, 张晓, 程文迪, 石佳. 基于新型存储器件的分布式文件系统性能优化[J]. 计算机应用, 2020, 40(12): 3594-3603.
[2]	李耘书, 滕飞, 李天瑞. 基于微操作的Hadoop参数自动调优方法[J]. 计算机应用, 2019, 39(6): 1589-1594.
[3]	郑振涛, 赵卓峰, 王桂玲, 徐垚. 面向港口停留区域识别的船舶停留轨迹提取方法[J]. 计算机应用, 2019, 39(1): 113-117.
[4]	李强, 刘晓峰. 基于Hopfield神经网络的云存储负载均衡策略[J]. 计算机应用, 2017, 37(8): 2214-2217.
[5]	吴家皋, 夏轩, 刘林峰. 基于MapReduce的轨迹压缩并行化方法[J]. 计算机应用, 2017, 37(5): 1282-1286.
[6]	苑中梁, 陈兴蜀, 王毅桐. IaaS环境下多租户安全资源分配算法和安全服务调度框架[J]. 计算机应用, 2017, 37(2): 383-387.
[7]	温占考, 易秀双, 田申申, 李婕, 王兴伟. 基于边界矩阵低阶近似和近邻模型的协同过滤算法[J]. 计算机应用, 2017, 37(12): 3472-3476.
[8]	付晨, 钟诚, 叶波. MapReduce并行加速数据流多模式相似性搜索[J]. 计算机应用, 2017, 37(1): 37-41.
[9]	杨俊杰, 廖卓凡, 冯超超. 大数据存储架构和算法研究综述[J]. 计算机应用, 2016, 36(9): 2465-2471.
[10]	邢淑凝, 刘方爱, 赵晓晖. 基于聚类划分的高效用模式并行挖掘算法[J]. 计算机应用, 2016, 36(8): 2202-2206.
[11]	朱洁, 李雯睿, 王江平, 赵红. 基于节点集计算能力差异的Hadoop自适应任务调度算法[J]. 计算机应用, 2016, 36(4): 918-922.
[12]	邱桂, 闫仁武. 基于灰色关联分析的分布式协同过滤推荐算法[J]. 计算机应用, 2016, 36(4): 1054-1059.
[13]	刘青, 付印金, 倪桂强, 梅建民. 基于Hadoop平台的分布式重删存储系统[J]. 计算机应用, 2016, 36(2): 330-335.
[14]	杨燕霞, 冯林. 基于Hadoop平台的并行DHP数据分析方法[J]. 计算机应用, 2016, 36(12): 3280-3284.
[15]	王春波, 董红斌, 印桂生, 刘文杰. 基于Hadoop的超像素分割算法[J]. 计算机应用, 2016, 36(11): 2985-2992.