计算机应用

• 人工智能与仿真 •    下一篇

一种基于存储熵的存储负载均衡算法

周渭博1,钟勇2,李振东2   

  1. 1. 中科院成都计算机应用研究所
    2. 中国科学院成都计算机应用研究所
  • 收稿日期:2017-02-22 修回日期:2017-04-10 发布日期:2017-04-10 出版日期:2017-05-13
  • 通讯作者: 周渭博

A Kind of Storage Load Balance Algorithm Based on Storage Entropy

  • Received:2017-02-22 Revised:2017-04-10 Online:2017-04-10 Published:2017-05-13
  • Contact: Wei-Bo ZHOU

摘要: 摘 要: 在分布式存储系统中,一般都是以磁盘空间利用率来判断各存储节点的负载均衡程度,当所有节点的磁盘空间利用率相等时,是整个分布式存储系统的存储负载均衡点。但是在实际的应用场景中,磁盘IO速率比较低的存储节点和可靠性比较低的存储节点往往成为影响整个存储系统数据读写性能的瓶颈,因此在异构分布式存储系统中,特别是各存储节点磁盘IO速率和可靠性差异较大的分布式存储系统中,如果仅仅以磁盘空间利用率作为存储负载均衡的判定条件,则其数据的读写效率必然受到限制。从读写效率的角度提出一种度量分布式存储系统中存储负载均衡的新思路。根据负载均衡理论和熵理论给出存储熵的定义,并提出一种基于存储熵的负载均衡算法,该算法通过系统负载判定、单节点负载判定和负载迁移实现了对分布式存储系统存储负载的量化调整,并通过实验与基于磁盘空间利用率的负载均衡算法进行了对比分析,验证了该算法对分布式存储系统中存储负载具有良好的均衡性,有效地控制了系统负载失衡的问题,提高了分布式存储系统的整体读写效率。

Abstract: Abstract: In the distributed storage system, disk space utilization is generally used to determine the load balance of each storage nodes. When given the equal disk space utilization in each nodes, the balance of storage load is achieved in whole distributed storage system. However, in practical, storage nodes are relatively low in disk IO speed and reliability, which is a bottleneck for the performance of data IO in whole storage system. Therefore in heterogeneous distributed storage system and specially the system which has great differences in disk IO rate and reliability of each storage nodes, the rate of data IO is necessarily limited, when disk space utilization is the only evaluation criteria of storage load balance. A new idea based on read-write efficiency is proposed about measuring the storage load balance in the distributed storage system. According to the definition of storage entropy given by the theory of load balance and entropy, a kind of load balance algorithm based on storage entropy (SE) is proposed. With system load and single node load determination as well as load shifting, this algorithm achieves the quantitative adjustment for storage load of the distributed storage system. Compared and experimented with load balance algorithm based on disk space utilization (DU), it is verified that the proposed algorithm have the advantage of balance for storage load in the distributed storage system. It also effectively restrain the system load unbalance and improve the overall efficiency of reading and writing of the distributed storage system.