Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (3): 767-775.DOI: 10.11772/j.issn.1001-9081.2022010074

• Data science and technology • Previous Articles    

Load balancing method based on local repair code in distributed storage

Yunbo LONG1,2, Dan TANG3()   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China
    2.School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
    3.School of Software Engineering,Chengdu University of Information Technology,Chengdu Sichuan 610225,China
  • Received:2022-01-20 Revised:2022-03-19 Accepted:2022-03-21 Online:2022-04-21 Published:2023-03-10
  • Contact: Dan TANG
  • About author:LONG Yunbo, born in 1997, M. S. candidate. His research interests include distributed storage, coding theory.
  • Supported by:
    Major Science and Technology Program of Sichuan Province(2019ZDZX0005)

分布式存储中基于局部修复码的负载均衡方法

龙运波1,2, 唐聃3()   

  1. 1.中国科学院 成都计算机应用研究所, 成都 610041
    2.中国科学院大学 计算机科学与技术学院, 北京 100049
    3.成都信息工程大学 软件工程学院, 成都 610225
  • 通讯作者: 唐聃
  • 作者简介:龙运波(1997—),男,贵州黔东南人,硕士研究生,主要研究方向:分布式存储、编码理论
    唐聃(1982—),男,四川绵阳人,教授,博士,CCF会员,主要研究方向:编码理论、分布式存储。
  • 基金资助:
    四川省重大科技专项(2019ZDZX0005)

Abstract:

For the low performance of hot data access in distributed storage, a load balancing method based on Local Repair Code (LRC) was proposed, which uses coding to avoid centralized access of nodes and improve the access efficiency of hot data. Firstly, a kind of special LRC suitable for small-scale storage systems was constructed by using Balanced Incomplete Block Design (BIBD), and it was able to provide multiple access methods for encoded data. Secondly, based on Reed Solomon (RS) code and random array code, LRC was extended to a larger scale situation, and it was able to meet certain fault tolerance requirements of the storage system. Finally, a hot data access algorithm was given to reduce the pressure of hot data access, and combined with a reasonable data layout scheme, the load balancing of the storage system in high-frequency access scenarios was achieved. Theoretical analysis and experimental results show that the proposed method can achieve load balancing with very small cost, and its effect is significantly better than that of the load balancing method implemented by multiple copies and Maximum-Distance-Separable (MDS) code in the traditional method. Especially, the proposed method solves the load imbalance problem caused by uneven access to hot and cold data, which can effectively improve the access efficiency of hot data storage systems.

Key words: distributed storage, hot data, Local Repair Code (LRC), load balancing, data layout

摘要:

针对分布式存储中热数据访问性能低下的问题,提出一种基于局部修复码(LRC)的负载均衡方法,采用编码的方式规避节点的集中式访问,并提高热数据的访问效率。首先,利用平衡不完全区组设计(BIBD)构造一类适用于小规模存储系统的特殊LRC,从而为编码数据提供多种访问方式;然后,分别基于里所(RS)码和随机阵列码将LRC推广到更大规模,并使它满足存储系统一定的容错需求;最后,提出一种热数据访问算法以降低热数据的访问压力,并结合合理的数据布局方案实现存储系统在高频访问场景下的负载均衡。理论分析和实验结果表明,所提方法能以极小的代价实现负载均衡,明显优于传统方法中利用多副本及最大距离可分(MDS)码实现的负载均衡方法,尤其是解决了因冷热数据访问不均带来的负载失衡问题,可以有效提高热数据存储系统的访问效率。

关键词: 分布式存储, 热数据, 局部修复码, 负载均衡, 数据布局

CLC Number: