Journal of Computer Applications ›› 2013, Vol. 33 ›› Issue (01): 211-214.DOI: 10.3724/SP.J.1087.2013.00211

• Network and distributed techno • Previous Articles     Next Articles

Distributed storage solution based on parity coding

CHEN Dongxiao1,2,WANG Peng1,3   

  1. 1. Chengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu Sichuan 610041, China
    2. Graduate University of Chinese Academy of Sciences, Beijing 100190, China
    3. Parallel Computing Laboratory, Chengdu University of Information Technology, Chengdu Sichuan 610025, China
  • Received:2012-07-09 Revised:2012-08-07 Online:2013-01-01 Published:2013-01-09
  • Contact: WANG Peng
  • Supported by:

    the National Nature Science Fundation of China under Grant

基于校验编码备份的分布存储方案

陈冬晓1,2,王鹏1,3   

  1. 1. 中国科学院 成都计算机应用研究所, 成都 610041
    2. 中国科学院研究生院, 北京 100049
    3. 成都信息工程学院 并行计算实验室, 成都 610225
  • 通讯作者: 王鹏
  • 作者简介:陈冬晓(1986-),男,四川德阳人,硕士研究生,主要研究方向:并行计算、云计算;王鹏(1975-),男,四川乐山人,教授,博士生导师,博士,CCF高级会员,主要研究方向:云计算、并行计算。
  • 基金资助:

    国家自然科学基金资助项目(60872064);成都市科技局创新发展战略研究项目(软科学)(11RKYB016ZF)

Abstract: To guarantee reliability, traditional cloud storage solutions generally backup data through mirror redundancy, which influences the usage efficiency of storage data space. A storage solution was proposed to reduce the usage of storage data space for redundancy-backup data. The solution introduced: 1) the parity coding backup instead of mirror backup, which reduced the size of backup data; 2) the conflict-jump mechanism to confirm the backup data, which guaranteed reliability while number of backup data copies was reduced. The contrast between running result of simulation program and performance of mainstream cloud storage solutions shows that, by using the proposed solution, the usage of storage space for distributed storage is significantly reduced while the reliability gets guaranteed.

Key words: cloud storage, consistent hash, Hadoop Distributed File System (HDFS), data backup, data recovery

摘要: 传统的云计算存储系统为保障可用性,一般使用镜像冗余备份而产生大量冗余备份数据,影响了存储数据空间的利用效率。针对此情况,为减少备份数据对存储空间的占用,提出一种存储方案。放弃了镜像冗余备份,引入校验编码的方式进行备份,以减少备份数据;同时采用了冲突跳转的机制对备份进行验证,在保证备份数据有效性的前提下减少备份数量。通过模拟程序运行结果与主流云存储方案的对比表明,所提存储方案在保证数据可靠性的同时,显著地降低了分布存储对磁盘空间的占用。

关键词: 云存储, 一致性哈希, Hadoop分布式文件系统, 数据备份, 数据恢复

CLC Number: