Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (10): 2942-2950.DOI: 10.11772/j.issn.1001-9081.2020010127

• Data science and technology • Previous Articles     Next Articles

Erasure code with low recovery-overhead in distributed storage systems

ZHANG Hang, LIU Shanzheng, TANG Dan, CAI Hongliang   

  1. School of Software Engineering, Chengdu University of Information Technology, Chengdu Sichuan 610225, China
  • Received:2020-02-13 Revised:2020-04-16 Online:2020-10-10 Published:2020-04-28
  • Supported by:
    This work is partially supported by the Sichuan Science and Technology Program (2020YFG0150), the Sichuan Artificial Intelligence Major Project (2018GZDZX0030), the Sichuan Science and Technology Achievement Transfer and Transformation Demonstration Project (2018CC0093).


张航, 刘善政, 唐聃, 蔡红亮   

  1. 成都信息工程大学 软件工程学院, 成都 610225
  • 通讯作者: 唐聃
  • 作者简介:张航(1995-),男,四川乐山人,硕士研究生,主要研究方向:编码理论、分布式存储;刘善政(1995-),男,河南濮阳人,硕士研究生,主要研究方向:图像秘密共享;唐聃(1982-),男,四川绵阳人,教授,博士,CCF会员,主要研究方向:编码理论、分布式存储;蔡红亮(1983-),男,山西临汾人,讲师,博士,主要研究方向:编码理论、视觉密码。
  • 基金资助:

Abstract: Erasure code technology is a typical data fault tolerance method in distributed storage systems. Compared with multi-copy technology, it can provide high data reliability with low storage overhead. However, the high repair cost limits the practical applications of erasure code technology. Aiming at problems of high repair cost, complex encoding and poor flexibility of existing erasure codes, a simple-encoding erasure code with low repair cost - Rotation Group Repairable Code (RGRC) was proposed. According to RGRC, multiple strips were combined into a strip set at first. After that, the association relationship between the strips was used to hierarchically rotate and encode the data blocks in the strip set to obtain the corresponding redundant blocks. RGRC greatly reduced the amount of data needed to be read and transmitted in the process of single-node repair, thus saving a lot of network bandwidth resources. Meanwhile, RGRC still retained high fault tolerance when solving the problem of high repair cost of a single node. And, in order to meet the different needs of distributed storage systems, RGRC was able to flexibly weigh the storage overhead and repair cost of the system. Comparison experiments were conducted on a distributed storage system, the experimental analysis shows that compared with RS (Reed-Solomon) codes, LRC (Locally Repairable Codes), basic-Pyramid, DLRC (Dynamic Local Reconstruction Codes), pLRC (proactive Locally Repairable Codes), GRC (Group Repairable Codes) and UFP-LRC (Unequal Failure Protection based Local Reconstruction Codes), RGRC can reduce the repair cost of single node repair by 14%-61% through adding a small amount of storage overhead, and reduces the repair time by 14%-58%.

Key words: distributed storage system, data recovery, single-node repair, erasure code, low repair cost

摘要: 纠删码技术是分布式存储系统中典型的数据容错方法,与多副本技术相比,能够以较低的存储开销提供较高的数据可靠性;然而,纠删码修复成本过高的特点限制了其应用。针对现有纠删码修复成本高、编码复杂和灵活性差的问题,提出一种编码简单的低修复成本的纠删码——旋转分组修复码(RGRC)。RGRC首先将多个条带组合成条带集,然后利用条带之间的关联关系对条带集内的数据块进行分层旋转编码,以此得到相应的冗余块。RGRC大幅度地减少了单节点修复过程中所需要读取和传输的数据量,从而能节省大量的网络带宽资源。同时RGRC在解决单节点修复成本高的问题时,依然保留着较高的容错能力,且为满足分布式存储系统的不同需求,可以灵活地权衡系统的存储开销和修复成本。在分布式存储系统中进行的对比实验分析结果展示,与其他常用的RS(Reed-Solomon)码、LRC(Locally Repairable Codes)、basic-Pyramid、DLRC(Dynamic Local Reconstruction Codes)、pLRC(proactive Locally Repairable Codes)、GRC(Group Repairable Codes)、UFP-LRC(Unequal Failure Protection based Local Reconstruction Codes)相比,RGRC只需要增加少量的存储开销,就能降低单节点修复14%~61%的修复成本,同时减少14%~58%的修复时间。

关键词: 分布式存储系统, 数据修复, 单节点修复, 纠删码, 低修复成本

CLC Number: