Abstract:The performance of Ceph system is significantly affected by the configuration parameters. In the optimization of configuration of Ceph cluster, there are many kinds of configuration parameters with complex meanings, which makes it difficult to achieve fast and accurate optimization. To solve the above problems, a parameter tuning method based on Random Forest (RF) and Genetic Algorithm (GA) was proposed to automatically adjust the Ceph parameter configuration in order to optimize the Ceph system performance. RF algorithm was used to construct a performance prediction model for the Ceph system, and the output of the prediction model was used as the input of GA, then the parameter configuration scheme was automatically and iteratively optimized by using GA. Simulation results show that compared with the system with default parameter configuration, the Ceph file system with optimized parameter configuration has the read and write performance improved by about 1.4 times, and the optimization time is much lower than that of the black box parameter tuning method.
陈禹, 毛莺池. 基于随机森林和遗传算法的Ceph参数自动调优[J]. 计算机应用, 2020, 40(2): 347-351.
CHEN Yu, MAO Yingchi. Automatic tuning of Ceph parameters based on random forest and genetic algorithm. Journal of Computer Applications, 2020, 40(2): 347-351.
[1] HUANG M,LUO L,LI Y,et al. Research on data migration optimization of Ceph[C]//Proceedings of the 14th International Computer Conference on Wavelet Active Media Technology and Information Processing. Piscataway:IEEE,2017:83-88. [2] 李翔. Ceph分布式文件系统的研究及性能测试[D]. 西安:西安电子科技大学,2014:15-20. (LI X. Research and performance testing of the Ceph distributed file system[D]. Xi'an:Xidian University,2014:15-20.) [3] ZHANG X,GADDAM S,CHRONOPOULOS A T. Ceph distributed file system benchmarks on an Openstack cloud[C]//Proceedings of the 2015 IEEE International Conference on Cloud Computing in Emerging Markets. Piscataway:IEEE,2015:113-120. [4] CAO Z,TARASOV V,TIWARI S. Towards better understanding of black-box auto-tuning:a comparative analysis for storage systems[C]//Proceedings of the 2018 Annual USENIX Technical Conference. Berkeley:USENIX Association,2018:893-907. [5] YIGITBASI N,WILLKE T L,LIAO G,et al. Towards machine learning-based auto-tuning of MapReduce[C]//Proceedings of the IEEE 21st International Symposium on Modeling,Analysis and Simulation of Computer and Telecommunication Systems. Piscataway:IEEE,2013:11-20. [6] 曾林西. 基于性能预估的Hadoop参数自动调优系统[D]. 武汉:华中科技大学,2013:34-37. (ZENG X L. A cost-based optimizer for configuration parameters of Hadoop candidate[D]. Wuhan:Huazhong University of Science and Technology,2013:34-37.) [7] CAI L,QI Y,LI J. A Recommendation-based parameter tuning approach for Hadoop[C]//Proceedings of the 2017 IEEE International Symposium on Cloud and Service Computing. Piscataway:IEEE, 2017:223-230. [8] 马跃, 余骋远, 于碧辉. 基于资源签名与遗传算法的Hadoop参数自动调优系统[J]. 计算机应用研究,2017,34(11):3219-3222, 3228. (MA Y,YU C Y,YU B H. Hadoop parameter automatic tuning system based on resource signature and genetic algorithm[J]. Application Research of Computers,2017,34(11):3219-3222,3228.) [9] WU D,WANG Y,FENG H,et al. Optimization design and realization of Ceph storage system based on software defined network[C]//Proceedings of the 13th International Conference on Computational Intelligence and Security. Piscataway:IEEE,2017:277-281. [10] 王皎, 刘闫锋. Hadoop集群参数的自动调优[J]. 电脑知识与技术,2012,8(12):2768-2772. (WANG J,LIU Y F. Parameter auto-tuning of Hadoop clusters[J]. Computer Knowledge and Technology,2012,8(12):2768-2772.) [11] 刘辉勇, 王勇, 俸皓. Ceph云存储中基于混合文件系统的读写性能优化方法[J]. 微电子学与计算机,2018,35(5):27-34. (LIU H Y,WANG Y,FENG H. Optimization of Ceph reads and writes performance based on hybrid file system[J]. Microelectronics and Computer,2018,35(5):27-34.) [12] BEI Z,YU Z,ZHANG H,et al. RFHOC:a random-forest approach to auto-tuning Hadoop's configuration[J]. IEEE Transactions on Parallel and Distributed Systems,2016,27(5):1470-1483. [13] YU Z,BEI Z,QIAN X. Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing[C]//Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. New York:ACM,2018:564-577. [14] YILDIRIM G,HALLAC İ R,AYDIM G,et al. Running genetic algorithms on Hadoop for solving high dimensional optimization problems[C]//Proceedings of the IEEE 9th International Conference on Application of Information and Communication Technologies. Piscataway:IEEE,2015:12-16. [15] BEI Z,YU Z,LUO N,et al. Configuring in-memory cluster computing using random forest[J]. Future Generation Computer Systems,2018,79(1):1-15.