计算机应用 ›› 2015, Vol. 35 ›› Issue (9): 2497-2502.DOI: 10.11772/j.issn.1001-9081.2015.09.2497

• 先进计算 • 上一篇    下一篇

云存储系统中文件分界点确定方法——Cut-GAR

邵田1,2, 陈广胜1,2, 景维鹏1,2   

  1. 1. 东北林业大学 信息与计算机工程学院, 哈尔滨 150040;
    2. 黑龙江省林业生态大数据存储与高性能(云)计算工程研究中心, 哈尔滨 150040
  • 收稿日期:2015-04-15 修回日期:2015-06-27 出版日期:2015-09-10 发布日期:2015-09-17
  • 通讯作者: 陈广胜(1969-),男,黑龙江哈尔滨人,教授,博士,主要研究方向:林业大数据存储、云计算,kjc_chen@163.com
  • 作者简介:邵田(1990-),女,山东泰安人,硕士研究生,主要研究方向:云计算;景维鹏(1979-),男,黑龙江鹤岗人,副教授,博士,主要研究方向:分布式计算、容错计算。
  • 基金资助:
    黑龙江省自然科学基金重点项目(ZD201403);哈尔滨市科技人才研究专项(2014RFQXJ132)。

Cut-GAR: solution to determine cut-off point in cloud storage system

SHAO Tian1,2, CHEN Guangsheng1,2, JING Weipeng1,2   

  1. 1. College of Information and Computer Engineering, Northeast Forestry University, Harbin Heilongjiang 150040, China;
    2. Heilongjiang Province Engineering Technology Research Center for Forestry Ecological Big Data Storage and High Performance (Cloud) Computing, Harbin Heilongjiang 150040, China
  • Received:2015-04-15 Revised:2015-06-27 Online:2015-09-10 Published:2015-09-17

摘要: 针对Hadoop分布式文件系统(HDFS)中小文件定义模糊导致HDFS处理小文件性能不佳的问题,提出了一种云存储系统中文件分界点的确定方法——Cut-GAR。该方法分析消耗NameNode内存(M)、文件上传速度(MUFS)、文件读取速度(MAFS)与文件大小之间的关系,得出文件大小三个近似最优值,FM、FMUFS、FMAFS;然后利用灰度关联分析,将M、MUFS、MAFS作为评价指标,文件大小作为评价对象,得到评价指标-评价对象灰色关联度以及评价指标所占权重,将FM、FMUFS、FMAFS与对应权重相乘,得出文件分界点。实验结果表明,Cut-GAR在M、MUFS 和MAFS三者之间取得平衡,可以有效确定文件分界点,提升小文件处理性能。

关键词: Hadoop 分布式文件系统, 小文件, 文件分界点, Cut-GAR, 灰色关联分析

Abstract: Considering poor performance caused by vague definition of small files in Hadoop Distributed File System (HDFS), Cut-off Point via Grey Relational Analysis (Cut-GAR) was presented to find the cut-off point between small files and large files, the relationship among the consumed memory of NameNode (M), speeds of MB of Uploaded Files per Second (MUFS), speeds of MB of Accessed Files per Second (MAFS) and file size was analyzed, the proper file sizes according to the three factors, were set respectively as FM, FMUFS and FMAFS. And then, grey relational analysis was taken to weight impacts of the three factors on file size while file size was treated as evaluated object, and M, MUFS and MAFS were employed as evaluated indexes, therefore the weight of evaluated index and relational degree of index-object were obtained. The outcome that the sum of FM, FMUFS, and FMAFS multiplied by the corresponding index weight was regarded the approximate optimal value of cut-off point. As experiment results demonstrate, Cut-GAR achieves a balance among M, MUFS, and MAFS, which improves the performance of small file processing.

Key words: Hadoop Distributed File System (HDFS), small file, cut-off point, Cut-off Point via Grey Relational Analysis (Cut-GAR), grey relational analysis

中图分类号: