计算机应用 ›› 2012, Vol. 32 ›› Issue (08): 2150-2158.DOI: 10.3724/SP.J.1087.2012.02150

• 先进计算 • 上一篇    下一篇

基于纠删码和动态副本策略的HDFS改进系统

李晓恺1,2,代翔1,2,李文杰1,2,崔喆1   

  1. 1. 中国科学院 成都计算机应用研究所,成都 610041
    2. 中国科学院 研究生院,北京 100049
  • 收稿日期:2012-02-22 修回日期:2012-03-26 发布日期:2012-08-28 出版日期:2012-08-01
  • 通讯作者: 李晓恺
  • 作者简介:李晓恺(1988-),男,安徽池州人,硕士研究生,主要研究方向:分布式存储、分布式计算;
    代翔(1983-),男,河南信阳人,博士研究生,主要研究方向:计算机网络、信息安全;
    李文杰(1987-),男,湖南益阳人,硕士研究生,主要研究方向:信息编码、分布式存储;
    崔喆(1970-),男,四川巴中人,研究员,主要研究方向:软件工程、计算机网络、信息安全。
  • 基金资助:
    国家863计划项目(2008AAO1Z402)

Improved HDFS scheme based on erasure code and dynamical-replication system

LI Xiao-kai1,2,DAI Xiang1,2,LI Wen-jie1,2,CUI Zhe1   

  1. 1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China
    2. Graduate University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2012-02-22 Revised:2012-03-26 Online:2012-08-28 Published:2012-08-01
  • Contact: LI Xiao-kai

摘要: 为了让Hadoop分布式文件系统(HDFS)达到更高的存储效率以及更加优化的负载均衡能力,针对HDFS的多副本存储技术提出了改进方案——Noah。Noah引入了编码和译码模块,对HDFS中的block进行编码分解,生成更多数量的数据分片(section),并随机地分散保存到集群当中,替代原有系统的多副本容灾策略;在集群出现节点失效的情况下,通过收集与失效block相关的任意70%左右的section进行原始数据的恢复;同时根据分布式集群运行情况以及对副本数目需求的不同采用动态副本策略。通过相关的集群实验,表明Noah在容灾效率、负载均衡、存储成本以及安全性上对HDFS作了相应的优化。

关键词: Hadoop分布式文件系统, 分布式存储, 数据容灾, 负载均衡, 动态副本

Abstract: In order to improve the storage efficiency of Hadoop Distributed File System (HDFS) and its load balance ability, this paper presented an improved solution named Noah to replace the original multiple-replication strategy. Noah introduced a coding module to HDFS. Instead of adopting the multiple-replication strategy by the original system, the module encoded every data block of HDFS into a greater number of data sections (pieces), and saved them dispersedly into the clusters of the storage system in distributed fashion. In the case of cluster failure, the original data would be recovered via decoding by collecting any 70% of the sections, while the dynamic replication strategy also worked synchronously, in which the amount of copies would dynamically change with the demand. The experimental results in analogous clusters of storage system show the feasibility and advantages of new measures in proposed solution.

Key words: Hadoop Distributed File System (HDFS), distributed storage, data disaster recovery, load-balance, dynamic replication

中图分类号: