面向大数据任务的调度方法

doi:10.11772/j.issn.1001-9081.2020030348

计算机应用 ›› 2020, Vol. 40 ›› Issue (10): 2923-2928.DOI: 10.11772/j.issn.1001-9081.2020030348

面向大数据任务的调度方法

李孜颖, 石振国

南通大学信息科学技术学院, 江苏南通 226001

收稿日期:2020-03-24 修回日期:2020-05-08 发布日期:2020-05-18 出版日期:2020-10-10
通讯作者: 石振国
作者简介:李孜颖(1996-),女,江苏南京人,硕士研究生,主要研究方向:大数据、人工智能;石振国(1963-),男,江苏南通人,副教授,博士,CCF会员,主要研究方向:人工智能、机器学习。
基金资助:
江苏省自然科学基金资助项目（18KJB520041）；南通市科技项目（JC2018132）；南京航空航天大学高安全系统的软件开发与验证技术工业和信息化部重点实验室开放基金资助项目（NJ2018014）。

Scheduling method for big data tasks

LI Ziying, SHI Zhenguo

School of Information Science and Technology, Nantong University, Nantong Jiangsu 226001, China

Received:2020-03-24 Revised:2020-05-08 Online:2020-05-18 Published:2020-10-10
Supported by:
This work is partially supported by the Natural Science Foundation of Jiangsu Province (18KJB520041), the Science and Technology Project of Nantong City (JC2018132), the Open Project of Key Laboratory of Ministry of Industry and Information Technology of Safety-Critical Software at Nanjing University of Aeronautics and Astronautics (NJ2018014).

摘要/Abstract

摘要： 针对在大数据的处理过程中，对大数据任务的划分和资源分配缺乏合理性的问题，提出一种面向大数据任务的调度方法。该方法首先引入了调度理论用于处理大数据任务，帮助建立合理的大数据任务管理体系并规范大数据任务处理流程；然后，基于大数据任务的本质对数据集进行分析处理，引入决策表进行属性约简，以减小大数据分析任务的数据量和提高大数据分析效率；最后，采用模糊综合评价方法，将模糊综合评价的结果作为对任务调度的依据，以提高任务资源分配合理性。在UCI（University of California Irvine）数据集上进行测试，实验结果表明，该调度算法在平均预测准确度上比朴素贝叶斯（NB）算法高7.42个百分点，比误差反向传播（BP）算法高5.16个百分点，比均方根传递（RMSProp）算法高3.74个百分点。而对于特征数较多的数据集，所提算法在预测精度上较其他算法有显著提高。所提算法在平均调度长度比（SLR）上较HCPFS（Heterogeneous Critcal Path First Synthesis）算法和HIPLTS（Heterogeneous Improved Priority List for Task Scheduling）算法分别下降了12.14%和4.56%，在平均加速比上分别提升了7.14%和42.56%，表明该算法能有效提高大数据系统中任务调度的效率。综合比较分析，所提方法具有较高的预测精度，且高效可靠。

关键词: 大数据, 任务调度, 决策表, 属性约简, 模糊综合评价

Abstract: Because the division and resource allocation of big data tasks lacks rationality in big data processing procedure, a scheduling method for big data tasks was proposed. First, in order to establish a reasonable management system of big data tasks and standardize the big data task processing flow, the scheduling theory was introduced to handle big data tasks. Then, based on the natures of big data tasks, the datasets were analyzed and handled, the decision table was introduced to perform attribute reduction, so as to reduce the data amount of big data analysis tasks and improve the big data analysis efficiency. Finally, the fuzzy comprehensive evaluation method was adopted, and the result of fuzzy comprehensive evaluation was used as the basis for task scheduling, thereby improving the rationality of task resource allocation. Experimental results on University of California Irvine (UCI) datasets show that the average prediction accuracy of the proposed scheduling algorithm is 7.42 percentage points higher than that of the Naive Bayes (NB) algorithm, 5.16 percentage points higher than that of the error Back Propagation (BP) algorithm, and 3.74 percentage points higher than that of the Root Mean Square Prop (RMSProp) algorithm. For datasets with a large number of features, the prediction accuracy of the proposed algorithm is significantly improved compared to those of other algorithms. Compared with Heterogeneous Critcal Path First Synthesis (HCPFS) algorithm and Heterogeneous Improved Priority List for Task Scheduling (HIPLTS) algorithm, the proposed algorithm has the average Scheduling Length Ratio (SLR) decreased by 12.14% and 4.56% respectively, and the average speedup ratio increased by 7.14% and 42.56% respectively, showing that the algorithm can effectively improve the efficiency of task scheduling in big data systems. Comprehensive analysis shows that the proposed algorithm performs well in prediction accuraing, and is efficient and reliable.

Key words: big data, task scheduling, decision table, attribute reduction, fuzzy comprehensive evaluation

中图分类号:

TP181

李孜颖, 石振国. 面向大数据任务的调度方法[J]. 计算机应用, 2020, 40(10): 2923-2928.

LI Ziying, SHI Zhenguo. Scheduling method for big data tasks[J]. Journal of Computer Applications, 2020, 40(10): 2923-2928.

参考文献

[1] 张引, 陈敏, 廖小飞. 大数据应用的现状与展望[J]. 计算机研究与发展,2013,50(S2):216-233.(ZHANG Y,CHEN M,LIAO X F. Big data applications:a survey[J]. Journal of Computer Research and Development,2013,50(S2):216-233.)
[2] 廖彬, 张陶, 于炯, 等. 多MapReduce作业协同下的大数据挖掘类算法资源效率优化[J]. 计算机应用研究,2020,37(5):1321-1325. (LIAO B,ZHANG T,YU J,et al. Resource efficiency optimization for big data mining algorithm with multi MapReduce collaboration scenario[J]. Application Research of Computers, 2020,37(5):1321-1325.)
[3] 马小晋, 饶国宾, 许华虎. 云计算中任务调度研究的调查[J]. 计算机科学,2019,46(3):1-8.(MA X J,RAO G B,XU H H. Research on task scheduling in cloud computing[J]. Computer Science,2019,46(3):1-8.)
[4] 方军, 张璋, 张雪峰, 等. 基于均衡适应度的云工作流调度算法[J]. 计算机应用与软件,2019,36(5):255-261.(FANG J, ZHANG Z, ZHANG X F, et al. Cloud workflow scheduling algorithm based on trade-off fitness[J]. Computer Applications and Software,2019,36(5):255-261.)
[5] 徐超, 吴波, 姜丽丽, 等. 云-边缘系统中跨域大数据作业调度技术研究[J]. 计算机应用研究,2020,37(3):754-758.(XU C, WU B,JIANG L L,et al. Task scheduling for geo-distributed data analytics in cloud-edge system[J]. Application Research of Computers,2020,37(3):754-758.)
[6] 肖俊明, 高洪洋, 朱永胜, 等. 考虑新能源接入的电力多目标优化调度[J]. 计算机工程与应用,2019,55(23):241-247.(XIAO J M,GAO H Y,ZHU Y S,et al. Multi-objective power dispatching considering new energy access[J]. Computer Engineering and Applications,2019,55(23):241-247.)
[7] 李罡, 吴志军. 基于多QoS约束条件的广域信息管理系统任务调度算法[J]. 通信学报,2019,40(7):27-37.(LI G,WU Z J. Task scheduling algorithm for system-wide information management based on multiple QoS constraints[J]. Journal on Communications, 2019,40(7):27-37.)
[8] 叶符明, 李雯婷, 王颖. MC2ETS:移动云计算中一种能效任务调度算法[J]. 计算机科学,2019,46(6):135-142.(YE F M,LI W T, WANG Y. MC2ETS:an energy-efficient tasks scheduling algorithm in mobile cloud computing[J]. Computer Science,2019, 46(6):135-142.)
[9] 孟宪福, 张晓燕. 对等网络环境下基于相似度的任务调度策略研究[J]. 计算机集成制造系统,2007,13(12):2446-2451. (MENG X F,ZHANG X Y. Task scheduling strategy based on similarity in peer to peer network[J]. Computer Integrated Manufacturing Systems,2007,13(12):2446-2451.)
[10] 李静梅, 孙冬微, 吴艳霞. 一种全局较优的静态任务调度算法[J]. 计算机应用研究,2014,31(4):1027-1030.(LI J M,SUN D W, WU Y X. Global comparatively optimum static task scheduling algorithm[J]. Application Research of Computers, 2014,31(4):1027-1030.)
[11] 李学龙, 龚海刚. 大数据系统综述[J]. 中国科学:信息科学, 2015,45(1):1-44.(LI X L,GONG H G. A survey on big data systems[J]. SCIENTIA SINICA Informationis,2015,45(1):1-44.)
[12] 李思源, 单青. 决策表技术的演变及其应用领域[J]. 系统工程,1996,14(6):10-14,5.(LI S Y,SHAN Q. Evolution of decision table technology and its application fields[J]. Systems Engineering,1996,14(6):10-14,5.)
[13] 李旭, 荣梓景, 任艳. 带权决策表的属性约简[J]. 计算机工程与应,2020,56(12):54-59.(LI X,RONG Z J,REN Y. Attribute reduction on weighted decision table[J]. Computer Engineering and Applications,2020,56(12):54-59.)
[14] 黄卫华. 基于信息论的属性约简算法[J]. 湖北民族学院学报(自然科学版),2018,36(3):289-292.(HUANG W H. Attribute reduction algorithm based on information theory[J]. Journal of Hubei University for Nationalities (Natural Science Edition),2018,36(3):289-292.)
[15] ESMAEILBEIGI M,CHATRABGOUN O,HOSSEINIAN-FAR A, et al. A low cost and highly accurate technique for big data spatialtemporal interpolation[J]. Applied Numerical Mathematics, 2020,153:492-502.
[16] LIU Y,ZHENG L,XIU Y,et al. Discernibility matrix based incremental feature selection on fused decision tables[J]. International Journal of Approximate Reasoning, 2020, 118:1-26.
[17] SHU W, QIAN W, XIE Y, et al. An efficient uncertainty measure-based attribute reduction approach for interval-valued data with missing values[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2019, 27(6):931-947.
[18] WEI W,SONG P,LIANG J,et al. Accelerating incremental attribute reduction algorithm by compacting a decision table[J]. International Journal of Machine Learning and Cybernetics,2019, 10(9):2355-2373.
[19] MEI Y,NGUYEN S,XUE B,et al. An efficient feature selection algorithm for evolving job shop scheduling rules with genetic programming[J] IEEE Transactions on Emerging Topics in Computational Intelligence,2017,1(5):339-353.
[20] 范会联, 仲元昌. 结合互信息的多目标属性约简[J]. 计算机应用研究,2012,29(2):490-492,529.(FAN H L,ZHONG Y C. Algorithm of attributes reduction based on multi-objective evolutionary and mutual information[J]. Application Research of Computers,2012,29(2):490-492,529.)
[21] PAWLAK Z. Rough sets[J]. International Journal of Computer and Information Science,1982,11(5):341-356.
[22] 钱进, 苗夺谦, 张泽华, 等. MapReduce框架下并行知识约简算法模型研究[J]. 计算机科学与探索,2013,7(1):35-45. (QIAN J,MIAO D Q,ZHANG Z H,et al. Parallel algorithm model for knowledge reduction using MapReduce[J]. Journal of Frontiers of Computer Science and Technology,2013,7(1):35-45.)
[23] 李玉琳, 高志刚, 韩延玲. 模糊综合评价中权值确定和合成算子选择[J]. 计算机工程与应用,2006,42(23):38-42,197.(LI Y L,GAO Z G,HAN Y L. The determination of weight value and the choice of composite operators in fuzzy comprehensive evaluation[J]. Computer Engineering and Applications,2006,42(23):38-42,197.)
[24] 杜金环, 彭霞. 软件质量模糊综合评价模型与实例分析[J]. 信息技术, 2014(7):62-65. (DU J H, PENG X. Fuzzy comprehensive evaluation model and example analysis of software quality[J]. Information Technology,2014(7):62-65.)
[25] 高晓林. 基于德尔菲法和模糊综合评价法的国际工程项目风险分析[J]. 项目管理技术,2018,16(8):85-92.(GAO X L. International engineering project risk analysis based on Delphi method and fuzzy comprehensive evaluation method[J]. Project Management Technology,2018,16(8):85-92.)
[26] 周黎莎, 于新华. 基于网络层次分析法的电力客户满意度模糊综合评价[J]. 电网技术,2009,33(17):191-197.(ZHOU L S, YU X H. Fuzzy comprehensive evaluation of power customer satisfaction based on analytic network process[J]. Power System Technology,2009,33(17):191-197.)
[27] 贾博婷, 赵天威, 祝志川. 基于熵值修正G2赋权的综合评价方法及实证[J]. 统计与决策,2019,35(8):30-35.(JIA B T, ZHAO T W,ZHU Z C. Comprehensive evaluation method and empirical analysis of G2 weighting based on entropy modification[J]. Statistics and Decision,2019,35(8):30-35.)
[28] 刘维学. 系统评价指标体系与灰色模糊评价模型构建[J]. 计算机技术与发展,2013,23(10):193-196,200.(LIU W X. Construction of system evaluation index system and Grey fuzzy evaluation model[J]. Computer Technology and Development, 2013,23(10):193-196,200.)
[29] 罗辛, 欧阳元新, 熊璋, 等. 通过相似度支持度优化基于K近邻的协同过滤算法[J]. 计算机学报,2010,33(8):1437-1445. (LUO X,OUYANG Y X,XIONG Z,et al. The effect of similarity support in k-nearest-neighborhood based collaborative filtering[J]. Chinese Journal of Computers,2010,33(8):1437-1445.)
[30] SALAMON A. Oozie,workflow engine for Apache Hadoop[EB/OL].[2020-04-30]. http://oozie.apache.org/docs/5.2.0/index.html.
[31] BLAKE C L,MERZ C J. UCI repository of machine learning databases[EB/OL].[2020-04-20]. http://mlearn.ics.uci.edu/MLRepository.html.
[32] 赵欢, 江文, 李学辉. 异构系统中的综合性启发式任务调度算法[J]. 计算机应用,2010,30(5):1316-1320.(ZHAO H,JIANG W,LI X H. Synthesized heuristic task scheduling algorithm for heterogeneous system[J]. Journal of Computer Applications, 2010,30(5):1316-1320.)
[33] 李静梅, 王雪, 吴艳霞. 一种改进的优先级列表任务调度算法[J]. 计算机科学,2014,41(5):20-23,36.(LI J M,WANG X, WU Y X. Improved priority list task scheduling algorithm[J]. Computer Science,2014,41(5):20-23,36.)
[34] TOPCUOGLU H,HARIRI S,WU M. Performance-effective and low-complexity task scheduling for heterogeneous computing[J]. IEEE Transactions on Parallel and Distributed Systems,2002,13(3):260-274.

面向大数据任务的调度方法

Scheduling method for big data tasks

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李旭, 何玉林, 崔来中, 黄哲学, PHILIPPE Fournier‑Viger. 基于大数据随机样本划分的分布式观测点分类器[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1727-1733.
[2]	尚绍法, 蒋林, 李远成, 朱筠. 异构平台下卷积神经网络推理模型自适应划分和调度方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2828-2835.
[3]	方和平, 刘曙光, 冉泳屹, 钟坤华. 基于深度强化学习的多数据中心一体化调度优化[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1884-1892.
[4]	曹萌, 余孙婕, 曾辉, 史红周. 基于区块链的医疗数据分级访问控制与共享系统[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1518-1526.
[5]	李元江, 权金升, 谭阳奕, 杨田. 基于相似和差异双视角的高维数据属性约简[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1467-1472.
[6]	宁明超, 张俊勃, 陈戈. 基于面向服务架构的工业软件的任务调度算法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 885-893.
[7]	杨力, 陈建廷, 向阳. 基于HBase的工业时序大数据分布式存储性能优化策略[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 759-766.
[8]	凌宇, 单志龙. 基于兴趣增强的知识概念推荐系统[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3697-3702.
[9]	刘乾, 张洋铭, 万定生. 网格化分布式新安江模型并行计算算法[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3327-3333.
[10]	吴仁彪, 张振驰, 贾云飞, 乔晗. 云平台下基于截止时间的自适应调度策略[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 176-184.
[11]	姜松岩, 廖晓鹃, 陈光柱. 基于可满足性模理论的多处理机通信延迟优化任务调度方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 185-191.
[12]	李艳, 范斌, 郭劼. 基于聚类粒化和簇间散度的属性约简算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2701-2712.
[13]	张金泉, 徐寿伟, 李信诚, 王重洋, 徐景芝. 基于正交自适应鲸鱼优化的云计算任务调度[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1516-1523.
[14]	康猛, 蒙祖强. 基于局部条件区分能力的高效属性约简算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 449-456.
[15]	刘超, 王磊, 杨文, 钟强强, 黎敏. 属性集变化条件下集值决策信息系统的增量属性约简方法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 463-468.