基于异常检测模型的异构环境下MapReduce性能优化

doi:10.11772/j.issn.1001-9081.2015.09.2476

计算机应用 ›› 2015, Vol. 35 ›› Issue (9): 2476-2481.DOI: 10.11772/j.issn.1001-9081.2015.09.2476

基于异常检测模型的异构环境下MapReduce性能优化

侯佳林, 王佳君, 聂洪玉

西南交通大学信息科学与技术学院, 成都 610031

收稿日期:2015-04-30 修回日期:2015-07-08 出版日期:2015-09-10 发布日期:2015-09-17
通讯作者: 侯佳林(1990-),男,河南洛阳人,硕士研究生,主要研究方向:MapReduce并行计算、藏语舆情监测,houjia_lin@foxmail.com
作者简介:王佳君(1990-),男,河北藁城人,硕士研究生,主要研究方向:藏语舆情监测;聂洪玉(1989-),女,四川内江人,硕士研究生,主要研究方向:藏语舆情监测。
基金资助:
中央高校基本科研业务费专项资金专题研究项目(SWJTU11ZT08);国家语委"十二五"科研规划项目(YB125-49)。

MapReduce performance optimization based on anomaly detection model in heterogeneous cloud environment

HOU Jialin, WANG Jiajun, NIE Hongyu

School of Information Science and Technology, Southwest Jiaotong University, Chengdu Sichuan 610031, China

Received:2015-04-30 Revised:2015-07-08 Online:2015-09-10 Published:2015-09-17

摘要/Abstract

摘要： 针对"落伍者"的选择问题,提出利用故障诊断领域内通常使用的异常检测模型来选择"落伍者"的方法。首先,利用异常检测算法来发现集群中的"慢节点";然后改进MapReduce任务分配算法和推测执行算法,不再给"慢节点"分配任务并将"慢节点"中的任务分配至有空闲任务槽的正常节点中。在改进的推测执行算法中,因相同网段内的节点通常物理邻近,可提高数据传输速度,首次将"慢节点"中的任务分配至同网段的正常节点中,以便数据传输。实例验证结果表明,使用异常检测算法后可迅速检测出异常节点,且与Hadoop-LATE算法相比,处理相同任务量可缩短集群17%的任务处理时间,说明所提算法在集群整体性能优化中表现优异。

关键词: 异常检测, MapReduce性能优化, 推测执行, 异构环境

Abstract: To effectively select the straggler machines, an anomaly detection model generally adopted in failure analysis was proposed. Firstly, an anomaly detection algorithm was employed to detect the slow nodes in the cluster. Secondly, task assignment algorithm and speculative execution algorithm were improved to stop assigning new tasks to slow nodes and these tasks were assigned to normal nodes with idle slots. In the improved speculative execution, it was for the first time that those tasks in slow nodes were transferred into the normal nodes in the same network segment, since data transferring can be physically accelerated in one network segment. The experimental results demonstrate that the straggler machines are quickly detected after running the anomaly detection algorithm. Compared with the algorithms in Hadoop-LATE, 17% of the processing time can be saved when the same amount of the tasks are processed, which concludes that the proposed algorithm is more suitable for improving the overall performance of the clusters.

Key words: anomaly detection, MapReduce performance optimization, speculative execution, heterogeneous environment

中图分类号:

TP302

侯佳林, 王佳君, 聂洪玉. 基于异常检测模型的异构环境下MapReduce性能优化[J]. 计算机应用, 2015, 35(9): 2476-2481.

HOU Jialin, WANG Jiajun, NIE Hongyu. MapReduce performance optimization based on anomaly detection model in heterogeneous cloud environment[J]. Journal of Computer Applications, 2015, 35(9): 2476-2481.

参考文献

[1] DUBEY P. A platform 2015 workload model recognition, mining and synthesis moves computers to the era of tera [EB/OL]. [2014-12-22]. http://www.doc88.com/p-548885572006.html.
[2] GANTZ J, REINSEL D, The digital universe in 2020: big data, bigger digital shadows,and biggest growth in the far east [EB/OL]. [2014-12-27]. http://www.emc.com/leadership/digital-universe/2012iview/index.htm.
[3] DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters [J]. Communications of the ACM, 2008,51(1):107-113.
[4] DECANDIA G, HASTORUN D, VOGELS W, et al. Dynamo:amazon's highly available key-value store [C]//Proceedings of 21st ACM SIGOPS symposium on Operating Systems Principles. New York: ACM, 2007:205-220.
[5] CHANG F, DEAN J, GHEMAWAT S, et al. Bigtable: a distributed storage system for structured data [EB/OL]. [2015-01-01]. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=07E0A22BBD6E6C41178ED2262ECC1314?doi=10.1.1.124.184&rep=rep1&type=pdf.
[6] CHEN Q, LIU C, ZHEN X. Improving MapReduce performance using smart speculative execution strategy [J]. IEEE Transactions on Computers, 2014,63(4):954-967.
[7] ANANTHANARAYANAN G, KANDULA S, GREENBERG A, et al. Reining in the outliers in MapReduce clusters using mantri [C]//Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2010:1-16.
[8] CHEN Q, ZHANG D, GUO M, et al. SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment [C]//Proceedings of the 2010 IEEE 10th International Conference on Computer and Information Technology. Piscataway: IEEE, 2010:2736-2743.
[9] SUN X, HE C, LU Y. ESAMR: an enhanced self-adaptive MapReduce scheduling algorithm [C]//Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems. Piscataway: IEEE, 2012:148-155.
[10] XIE J, YIN S, RUAN X, et al. Improving MapReduce performance through data placement in heterogeneous Hadoop clusters [C]//Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing. Piscataway: IEEE, 2010:1-9.
[11] FISCHER M, SU X, YIN Y. Assigning tasks for efficiency in Hadoop: extended abstract [C]//Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures. New York: ACM, 2010:30-39.
[12] FADIKA Z, DEDE E, HARTOG J, et al. MARLA: MapReduce for heterogeneous clusters [C]//Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Piscataway: IEEE, 2012:49-56.
[13] XUE S J, PAN W, FANG W. A novel approach in improving I/O performance of small meteorological files on HDFS [J]. Applied Mechanics and Materials, 2011,117/118/119(10):1759-1765.
[14] JIANG D, OOI B, SHI L, et al. The performance of MapReduce: an in-depth study [J]. Proceedings of the VLDB Endowment, 2010,3(1/2):472-483.
[15] RASOOLI A, DOWN D. A hybrid scheduling approach for scalable heterogeneous Hadoop systems [C]//Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. Washington, DC: IEEE Computer Society, 2012:1284-1291.
[16] TAX D. One-class classification-concept-learning in the absence of counter-examples [D]. Delft: Delft University of Technology, 2001.
[17] CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection: a survey [J]. ACM Computing Surveys, 2009,41(3):15-58.
[18] TARASSENKO L, HAYTON P, BRADY M. Novelty detection for the identification of masses in mammograms [C]//Proceedings of the 4th International Conference on Artificial Neural Networks. Cambridge:IET, 1995:110-115.
[19] DUDA R O, HART P E, STORK D G. Pattern classification [M]. New York: John Wiely & Sons, 2001.
[20] ZHENG X, XIANG M, ZHANG D, et al. An adaptive tasks scheduling method based on the ability of node in Hadoop cluster [J]. Journal of Computer Research and Development, 2014,51(3):618-626(郑晓薇,项明,张大为,等.基于节点能力的Hadoop集群任务自适应调度方法[J].计算机研究与发展,2014,51(3):618-626.)

基于异常检测模型的异构环境下MapReduce性能优化

MapReduce performance optimization based on anomaly detection model in heterogeneous cloud environment

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	胡天杰, 胡文军, 王士同. 分布熵惩罚的支持向量数据描述[J]. 计算机应用, 2021, 41(8): 2212-2218.
[2]	孟凡, 陈广, 王勇, 高阳, 高德群, 贾文龙. 基于多粒度时序结构表示的异常检测算法在储层含油性检测中应用[J]. 计算机应用, 2021, 41(8): 2453-2459.
[3]	李衍志, 范勇, 高琳. 基于形态流的石油钻井水流异常检测[J]. 计算机应用, 2021, 41(6): 1842-1848.
[4]	姚杰, 程春玲, 韩静, 刘峥. 云工作流中基于多任务时序卷积网络的异常检测方法[J]. 计算机应用, 2021, 41(6): 1701-1708.
[5]	谢雨, 蒋瑜, 龙超奇. 基于随机子空间的扩展隔离林算法[J]. 计算机应用, 2021, 41(6): 1679-1685.
[6]	张晨曦, 唐曙, 唐珂. 迁移学习下的火箭发动机参数异常检测策略[J]. 计算机应用, 2020, 40(9): 2774-2780.
[7]	王磊. 改进粗糙集属性约简结合K-means聚类的网络入侵检测方法[J]. 计算机应用, 2020, 40(7): 1996-2002.
[8]	胡珉, 白雪, 徐伟, 吴秉键. 多维时间序列异常检测算法综述[J]. 计算机应用, 2020, 40(6): 1553-1564.
[9]	仇媛, 常相茂, 仇倩, 彭程, 苏善婷. 基于长短期记忆网络和滑动窗口的流数据异常检测方法[J]. 计算机应用, 2020, 40(5): 1335-1339.
[10]	霍纬纲, 王慧芳. 基于自编码器和隐马尔可夫模型的时间序列异常检测方法[J]. 计算机应用, 2020, 40(5): 1329-1334.
[11]	夏彬, 白宇轩, 殷俊杰. 基于生成对抗网络的系统日志级异常检测算法[J]. 计算机应用, 2020, 40(10): 2960-2966.
[12]	王伟, 谢耀滨, 尹青. 针对不平衡数据的决策树改进方法[J]. 计算机应用, 2019, 39(3): 623-628.
[13]	陶涛, 周喜, 马博, 赵凡. 基于双向LSTM的Seq2Seq模型在加油站时序数据异常检测中的应用[J]. 计算机应用, 2019, 39(3): 924-929.
[14]	刘子豪, 李凌, 叶枫. 基于SparkR的水文传感器数据的异常检测方法[J]. 计算机应用, 2019, 39(2): 436-440.
[15]	丁景全, 马博, 李晓. 基于融合时空数据的车辆加油行为多视图深度异常检测框架[J]. 计算机应用, 2019, 39(11): 3370-3375.