计算机应用 ›› 2017, Vol. 37 ›› Issue (8): 2209-2213.DOI: 10.11772/j.issn.1001-9081.2017.08.2209

• 先进计算 • 上一篇    下一篇

基于存储熵的存储负载均衡算法

周渭博1,2, 钟勇1, 李振东1,2   

  1. 1. 中国科学院 成都计算机应用研究所, 成都 610041;
    2. 中国科学院大学, 北京 100049
  • 收稿日期:2017-02-23 修回日期:2017-04-26 出版日期:2017-08-10 发布日期:2017-08-12
  • 通讯作者: 周渭博
  • 作者简介:周渭博(1981-),男,山东烟台人,博士研究生,CCF会员,主要研究方向:大数据;钟勇(1966-),男,四川岳池人,研究员,博士生导师,博士,CCF会员,主要研究方向:软件工程、大数据;李振东(1990-),男,宁夏银川人,博士研究生,主要研究方向:数据挖掘。
  • 基金资助:
    四川省科技支撑计划项目(2014GZ0013)。

Storage load balancing algorithm based on storage entropy

ZHOU Weibo1,2, ZHONG Yong1, LI Zhendong1,2   

  1. 1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2017-02-23 Revised:2017-04-26 Online:2017-08-10 Published:2017-08-12
  • Supported by:
    This work is partially supported by the Science and Technology Support Plan of Sichuan Province (2014GZ0013).

摘要: 在分布式存储系统中,一般都是以磁盘空间利用率(DU)来判断各存储节点的负载均衡程度,当所有节点的磁盘空间利用率相等时,是整个分布式存储系统的存储负载均衡点。但是在实际的应用场景中,磁盘I/O速率比较低的存储节点和可靠性比较低的存储节点往往成为影响整个存储系统数据读写性能的瓶颈,因此在异构分布式存储系统中,特别是各存储节点磁盘I/O速率和可靠性差异较大的分布式存储系统中,如果仅仅以磁盘空间利用率作为存储负载均衡的判定条件,则其数据的读写效率必然受到限制。从读写效率的角度提出一种度量分布式存储系统中存储负载均衡的新思路。根据负载均衡理论和熵理论给出存储熵(SE)的定义,并提出一种基于存储熵的负载均衡算法,该算法通过系统负载判定、单节点负载判定和负载迁移实现了对分布式存储系统存储负载的量化调整,并通过实验与基于磁盘空间利用率的负载均衡算法进行了对比分析,验证了该算法对分布式存储系统中存储负载具有良好的均衡性,有效地控制了系统负载失衡的问题,提高了分布式存储系统的整体读写效率。

关键词: 存储熵, 存储负载均衡, 读写效率, 分布式存储系统

Abstract: In the distributed storage system, Disk space Utilization (DU) is generally used to measure the load balance of each storage node. When given the equal disk space utilization to each node, the balance of storage load is achieved in the whole distributed storage system. However, in practice, the storage node with relatively low disk I/O speed and reliability becomes a bottleneck for the performance of data I/O in the whole storage system. Therefore in heterogeneous distributed storage system and specially the system which has great differences in disk I/O speed and reliability of each storage node, the speed of data I/O is definitely limited when disk space utilization is the only evaluation criteria of storage load balance. A new idea based on read-write efficiency was proposed to measure the storage load balance in the distributed storage system. According to the definition of Storage Entropy (SE) given by the theory of load balance and entropy, a kind of load balance algorithm based on SE was proposed. With system load and single node load determination as well as load shifting, the quantitative adjustment for storage load of the distributed storage system was achieved. The proposed algorithm was tested and compared with the load balance algorithm based on disk space utilization. Experimental results show that the proposed algorithm can balance storage load well in the distributed storage system, which effectively restrains the system load imbalance and improves the overall efficiency of reading and writing of the distributed storage system.

Key words: storage entropy, storage load balancing, read-write efficiency, distributed storage system

中图分类号: