Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (12): 3594-3603.DOI: 10.11772/j.issn.1001-9081.2020050632

• Computer software technology • Previous Articles     Next Articles

Performance optimization of distributed file system based on new type storage devices

DONG Cong1,2, ZHANG Xiao2,3, CHENG Wendi2,3, SHI Jia1,2   

  1. 1. School of Software, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China;
    2. Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology(Northwestern Polytechnical University), Xi'an Shaanxi 710129, China;
    3. College of Computer Science, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China
  • Received:2020-05-13 Revised:2020-06-19 Online:2020-12-10 Published:2020-08-14
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2018YFB1004400), the Beijing Natural Science Foundation-Haidian Original Innovation Union Foundation (L192027).

基于新型存储器件的分布式文件系统性能优化

董聪1,2, 张晓2,3, 程文迪2,3, 石佳1,2   

  1. 1. 西北工业大学 软件学院, 西安 710129;
    2. 大数据存储与管理工业和信息化部重点实验室(西北工业大学), 西安 710129;
    3. 西北工业大学 计算机学院, 西安 710129
  • 通讯作者: 张晓(1978-),男,河南南阳人,副教授,博士,CCF高级会员,主要研究方向:计算机存储系统、云计算、大数据存储与管理。zhangxiao@nwpu.edu.cn
  • 作者简介:董聪(1996-),女,陕西西安人,硕士研究生,主要研究方向:分布式文件系统性能优化、大数据存储与管理、新型存储器件;程文迪(1995-),女,陕西咸阳人,硕士研究生,主要研究方向:分布式文件系统性能优化、非易失内存优化;石佳(1995-),男,陕西咸阳人,硕士研究生,主要研究方向:分布式文件系统性能优化、非易失内存优化读写流程
  • 基金资助:
    国家重点研发计划项目(2018YFB1004400);北京市自然科学基金-海淀原始创新联合基金资助项目(L192027)。

Abstract: The I/O performance of new type storage devices is usually an order of magnitude higher than that of traditional Solid State Disk (SSD). However, simply replacing SSD with new type storage device will not significantly improve the performance of distributed file system. This means that the current distributed file system cannot give full play to the performance of new type storage devices. To solve the problem, the data writing process and transmission process of Hadoop Distributed File System (HDFS) were analyzed quantitatively. Through quantitative analysis of the time consumptions of different stages of HDFS writing process, the most time-consuming data transmission between nodes was found in each stage of writing data. Therefore, the corresponding optimization strategy was proposed, that is, the processes of data transmission and processing were parallelized by using asynchronous write. So that the processing stages of different data packets were parallel to each other, shortening the total processing time of data writing, thereby the write performance of HDFS was improved. Experimental results show the proposed scheme improves the HDFS write throughput by 15%-24%, and reduces the overall write execution time by 28%-36%.

Key words: distributed file system, Hadoop Distributed File System (HDFS), non-volatile memory, performance optimization, asynchronous write

摘要: 新型存储器件的I/O性能通常比传统固态驱动器(SSD)高一个数量级,然而使用新型存储器件的分布式文件系统相对于使用SSD的分布式文件系统性能并没有显著的提高,这说明目前的分布式文件系统并不能充分发挥新型存储器件的性能。针对这个问题,对Hadoop分布式文件系统(HDFS)的数据写入流程及传输过程进行了量化分析。通过量化分析HDFS数据写入过程各阶段的时间开销,发现在写入数据的各个阶段中,节点间数据传输的时间占比较大。因此提出了对应的优化方案,通过异步写入的方式并行化数据传输与处理过程,使得不同数据包的处理阶段叠加起来,减少了数据包整体的处理时间,从而提升了HDFS的写入性能。实验结果表明,所提方案将HDFS的写入吞吐量提升了15%~24%,总体的写入执行时间降低了28%~36%。

关键词: 分布式文件系统, Hadoop分布式文件系统, 非易失性存储器, 性能优化, 异步写入

CLC Number: