MapReduce Shuffle性能改进

摘要/Abstract

摘要： MapReduce是一种编程模型，作为Hadoop核心组件它对Hadoop在大数据的处理过程中的性能和效率起着关键性作用。对于Reduce端从Map端拷贝大量的结果数据耗时问题，提出对Map节点上同一作业的多个Map任务所产生的大量临时结果数据做总的合并，取代原有MapReduce架构对单个Map任务的结果数据做合并的机制。该方案减少了Map节点的输出结果数据量，以达到大量减少整个集群的网络传输数据量，节省Reduce端拷贝Map端输出数据的时间，从而减少整个MapReduce作业执行时间提升MapReduce的执行性能。

关键词: Hadoop, MapReduce, Shuffle, 性能

Abstract: As the core component of Hadoop, MapReduce is a programming model, It determines the performance and efficiency of Hadoop in its treatment of large data. Putting forward this idea that combining a large amount of the temporary result data produced by many Map tasks in the same job of the Map node for a lot of time consuming by Reduce end pulling a large amount of result data from the Map end via the internet,replaceing the mechanism that the original MapReduce architecture combines the result data of a single Map task .Through the improved project, the amount of output result data decreased on the Map node ,so that the amount of data transmission of the entire cluster is decreased ,saving the time of Reduce end copying Map end output data ,so the execution time of the MapReuce job is reduced, which improves the execution performance of the MapReduce .

Key words: Hadoop, MapReduce, Shuffle, performance

中图分类号:

中图分类号：TP311.5

熊倩张? 郭明. MapReduce Shuffle性能改进[J]. 计算机应用.

[1]	葛超, 张亚欣, 刘月, 王红. 事件触发下的网络化系统非脆弱耗散控制方案[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 615-621.
[2]	李洪亮, 张弄, 孙婷, 李想. 分布式机器学习作业性能干扰分析与预测[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1649-1655.
[3]	王令照, 仇润鹤. 基于非线性能量收集的全双工认知中继网络的联合优化方法[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3130-3139.
[4]	徐雪敏, 张秀国, 肖媛元, 曹志英. 基于优化的灰狼算法的大规模Web服务组合[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3162-3169.
[5]	安鑫, 杨海娇, 李建华, 任福继. 热安全约束下异构多核系统动态映射方法[J]. 计算机应用, 2021, 41(9): 2631-2638.
[6]	朱亮, 徐华, 崔鑫. 基于基分类器系数和多样性的改进AdaBoost算法[J]. 计算机应用, 2021, 41(8): 2225-2231.
[7]	张杨, 董士程. 面向并发程序中锁机制的智能化推荐方法[J]. 计算机应用, 2021, 41(6): 1597-1603.
[8]	许飞雪, 刘勤明, 欧阳海玲, 叶春明. 基于性能合同的多部件系统维修策略模型优化[J]. 计算机应用, 2021, 41(4): 1184-1191.
[9]	王晓峰, 蒋彭龙, 周辉, 赵雄波. 面向卷积神经网络的高并行度FPGA加速器设计[J]. 计算机应用, 2021, 41(3): 812-819.
[10]	马晓航, 廖灵霞, 李智, 秦斌, 赵涵捷. 基于动态混合超时的软件定义网络多目标优化[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3658-3665.
[11]	董豪宇, 陈康. 纯用户态的网络文件系统——RUFS[J]. 计算机应用, 2020, 40(9): 2577-2585.
[12]	苟子安, 张晓, 吴东南, 王艳秋. 分布式存储系统中的日志分析与负载特征提取[J]. 计算机应用, 2020, 40(9): 2586-2593.
[13]	何韩森, 孙国梓. 基于特征聚合的假新闻内容检测模型[J]. 计算机应用, 2020, 40(8): 2189-2193.
[14]	潘国腾, 欧国东, 晁张虎, 李梦君. Lite寄存器模型的设计与实现[J]. 计算机应用, 2020, 40(5): 1369-1373.
[15]	王成, 叶保留, 梅峰, 卢文达. 基于远程直接内存访问的高性能键值存储系统[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 316-320.

MapReduce Shuffle性能改进

Improvement of MapReduce Shuffle Performance

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics