Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (6): 1589-1594.DOI: 10.11772/j.issn.1001-9081.2018122592

• 2018 National Annual Conference on High Performance Computing (HPC China 2018) • Previous Articles     Next Articles

Microoperation-based parameter auto-optimization method of Hadoop

LI Yunshu1, TENG Fei1,2, LI Tianrui1,2   

  1. 1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu Sichuan 611756, China;
    2. Key Laboratory of Cloud Computing and Intelligent Technology of Sichuan Province(Southwest Jiaotong University), Chengdu Sichuan 611756, China
  • Received:2018-12-12 Revised:2019-03-25 Online:2019-06-10 Published:2019-06-17
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61573292), the Science and Technology Project of Sichuan Province (2019YJ0214).


李耘书1, 滕飞1,2, 李天瑞1,2   

  1. 1. 西南交通大学 信息科学与技术学院, 成都 611756;
    2. 四川省云计算与智能技术高校重点实验室(西南交通大学), 成都 611756
  • 通讯作者: 滕飞
  • 作者简介:李耘书(1991-),男,四川广元人,硕士研究生,主要研究方向:云计算、机器学习:滕飞(1984-),女,山东泰安人,副教授,博士,CCF会员,主要研究方向:云计算、医学信息学、工业大数据挖掘:李天瑞(1969-),男,福建莆田人,教授,博士,CCF会员,主要研究方向:智能信息处理、数据挖掘、云计算。
  • 基金资助:

Abstract: As a large-scale distributed data processing framework, Hadoop has been widely used in industry during the past few years. Currently manual parameter optimization and experience-based parameter optimization are ineffective due to complex running process and large parameter space. In order to solve this problem, a method and an analytical framework for Hadoop parameter auto-optimization were proposed. Firstly, the operation process of a job was broken down into several microoperations and the microoperations were determined from the angle of finer granularity directly affected by variable parameters, so that the relationship between parameters and the execution time of a single microoperation was able to be analyzed. Then, by reconstructing the job operation process based on microoperations, a model of the relationship between parameters and the execution time of whole job was established. Finally, various searching optimization algorithms were applied on this model to efficiently and quickly obtain the optimized system parameters. Experiments were conducted with two types of jobs, terasort and wordcount. The experimental results show that, compared with the default parameters condition, the proposed method reduce the job execution time by at least 41% and 30% respectively. The proposed method can effectively improve the job execution efficiency of Hadoop and shorten the job execution time.

Key words: Hadoop, parameter optimization, microoperation, reconstitution, search algorithm

摘要: Hadoop作为大规模分布式数据处理框架已经在工业界得到广泛的应用,针对手动和经验调优方法中参数空间庞大和运行流程复杂的问题,提出了一种Hadoop参数自动优化的方法和分析框架。首先,对作业运行流程进行解耦,从可变参数直接影响的更细粒度的角度定义微操作,从而分析参数和单次微操作执行时间的关系;然后,利用微操作对作业运行流程进行重构,建立参数和作业运行时间关系的模型;最后,在此模型上应用各类搜索优化算法高效快速得出优化后的系统参数。在terasort和wordcount两个作业类型上进行了实验,实验结果表明,相对于默认参数情况,该方法使作业执行时间分别缩短了至少41%和30%。该方法能够有效提高Hadoop作业执行效率,缩短作业执行时间。

关键词: Hadoop, 参数调优, 微操作, 重构, 搜索算法

CLC Number: