Prevailing cloud storage systems normally use master/slave structure, which may cause performance bottlenecks and scalability problems in some extreme cases. So, fully distributed cloud storage system based on Distributed Hash Table (DHT) technology is becoming a new choice. How to solve load balancing problem for nodes, is the key for this technology to be applicable. The Kademlia algorithm was used to locate storage target in cloud storage system and its load balancing performance was investigated. Considering the load balancing performance of the algorithm significantly decreased in heterogeneous environment, an improved algorithm was proposed, which considered heterogeneous nodes and their storage capacities and distributed loads according to the storage capacity of each node. The simulation results show that the proposed algorithm can effectively improve load balance performance of the system. Compared with the original algorithm, after running a long period (more than 1500 hours in simulation), the number of overloaded nodes in system dropped at an average percentage 7.0%(light load) to 33.7%(heavy load), file saving success rate increased at an average percentage 27.2%(light load) to 35.1%(heavy load), and also its communication overhead is acceptable.
[1] GHEMAWAT S, GOBIOFF H, LEUNG S-T. The Google file system[C]//Proceedings of the 19th ACM Symposium on Operating Systems Principles. New York: ACM, 2003:29-43. [2] Hadoop community. Hadoop disributed file system[EB/OL].[2014-07-11]. http://hadoop.apache.org/. [3] ZHANG C, YIN J. Dynamic load balancing algorithm of distributed file system[J]. Journal of Chinese Computer Systems, 2011,32(7):1424-1426.(张聪萍,尹建伟.分布式文件系统的动态负载均衡算法[J].小型微型计算机系统,2011,32(7):1424-1426.) [4] LIU K, NIU W. An improved data balancing algorithm for Hadoop[J]. Journal of Hennan Polytechnic University: Natural Science, 2013,32(3):332-336.(刘琨,钮文良.一种改进的Hadoop数据负载均衡算法[J].河南理工大学学报:自然科学版,2013,32(3):332-336.) [5] STOICA I, MORRIS R, KARGER D, et al. Chord: a scalable peer-to-peer lookup service for Internet applications[C]//Proceedings of the International Conference of the Special Interest Group on Data Communication. New York: ACM, 2001:149-160. [6] ROWSTRON A, DRUSCHEL P. Pastry: scalable, decentralized object location and routing for large-scale peer-to-peer systems[C]//Proceedings of IFIP/ACM International Conference on Distributed Systems Platforms. New York: ACM, 2001:329-350. [7] HILDRUM K, KUBIATOWICZ J D, RAO S, et al. Distributed object location in a dynamic network[J]. Theory of Computer Systems, 2002,37(3):405-440. [8] MAYMOUNKOV P, MAZTERES D. Kademlia: a peer-to-peer information system based on the XOR metric[C]//Proceedings of the 1st International Workshop on Peer-to-Peer Systems. Berlin: Springer, 2002:153-161. [9] HUANG Q, CHENG Y, CHEN G. Research on hash algorithm for distributed storage system[J]. Computer Engineering and Applications, 2014,50(1):1-4.(黄秋兰,程耀东,陈刚. 分布式存储系统的哈希算法研究[J].计算机工程与应用,2014,50(1):1-4.) [10] WU J, FU J, PING L, et al. Study on the P2P cloud storage system[J]. Acta Electronica Sinica, 2011,39(5):1100-1107.(吴吉义,傅建庆,平玲娣,等.一种对等结构的云存储系统研究[J].电子学报,2011,39(5):1100-1107.) [11] YIN X, YANG J, QU C. Research on load balancing of distributed file system in cloud computing[J]. Computer Science, 2014,41(3):141-144.(尹向东,杨杰,屈长青.云计算环境下分布式文件系统的负载平衡研究[J].计算机科学,2014,41(3):141-144.) [12] JESI G P. Peersim howto: build a new protocol for the peersim 1.0 simulator[EB/OL].[2011-08-04]. http://peersim.sourceforge.net/tutorial1/tutorial1.pdf. [13] BROBERG J, BUYYA R, TARI Z. Creating a ‘cloud storage' mashup for high performance, low cost content delivery[C]//Proceedings of the 6th International Conference on Service-Oriented Computing, LNCS 5472. Berlin: Springer, 2009:178-183.