计算机应用 ›› 2014, Vol. 34 ›› Issue (11): 3091-3095.DOI: 10.11772/j.issn.1001-9081.2014.11.3091

• 2014年全国开放式分布与并行计算学术年会(DPCS 2014)论文 • 上一篇    下一篇

面向Hadoop分布式文件系统的小文件存取优化方法

李铁,燕彩蓉,黄永锋,宋亚龙   

  1. 东华大学 计算机科学与技术学院,上海 201620
  • 收稿日期:2014-07-18 修回日期:2014-07-30 出版日期:2014-11-01 发布日期:2014-12-01
  • 通讯作者: 燕彩蓉
  • 作者简介:李铁(1989-),男,湖南永州人,硕士研究生,主要研究方向:分布式存储、分布式计算;燕彩蓉(1978-),女,湖北仙桃人,副教授,博士,主要研究方向:并行计算、分布式计算、大数据处理;黄永锋(1971-),男,山东泰安人,副教授,博士,主要研究方向:数据挖掘、机器学习、图像处理;宋亚龙(1988-),男,河南鹤壁人,硕士研究生,主要研究方向:分布式存储、分布式计算。
  • 基金资助:

    国家自然科学基金资助项目;中央高校基本科研业务费专项资金资助项目;上海市自然科学基金资助项目

Optimization of small files storage and accessing on Hadoop distributed file system

LI Tie,YAN Cairong,HUANG Yongfeng,Song Yalong   

  1. School of Computer Science and Technology, Donghua University, Shanghai 201620, China
  • Received:2014-07-18 Revised:2014-07-30 Online:2014-11-01 Published:2014-12-01
  • Contact: YAN Cairong

摘要:

为提高Hadoop分布式文件系统(HDFS)的小文件处理效率,提出了一种面向HDFS的智能小文件存取优化方法——SmartFS。SmartFS通过分析小文件访问日志,获取用户访问行为,建立文件关联概率模型,并根据基于文件关联关系的合并算法将小文件组装成大文件之后存至HDFS;当从HDFS获取文件时,根据基于文件关联关系的预取算法来提高文件访问效率,并提出基于预取的缓存替换算法来管理缓存空间,从而提高文件的命中率。实验结果表明,SmartFS有效减少了HDFS中NameNode的元数据空间,减少了用户与HDFS的交互次数,提高了小文件的存储效率和访问速度。

Abstract:

In order to improve the efficiency of processing small files in Hadoop Distributed File System (HDFS), a new efficient approach named SmartFS was proposed. By analyzing the file accessing log to obtain the accessing behavior of users, SmartFS established a probability model of file associations. This model was the reference of merging algorithm to merge the relevant small files into large files which would be stored on HDFS. When a file was accessed, SmartFS prefetched the related files according to the prefetching algorithm to accelerate the access speed. To guarantee the enough cache space, a cache replacement algorithm was put forward. The experimental results show that SmartFS can save the metadata space of NameNode in HDFS, reduce the interaction between users and HDFS, and improve the storing and accessing efficiency of small files on HDFS.

中图分类号: