Abstract:Aiming at the sharp increasing of data on the cloud caused by the development and popularization of cloud native technology as well as the bottlenecks of the technology in performance and stability, a Haystack-based storage system was proposed. With the optimization in service discovery, automatic fault tolerance and caching mechanism, the system is more suitable for cloud native business and meets the growing and high-frequent file storage and read/write requirements of the data acquisition, storage and analysis industries. The object storage model used by the system satisfies the massive file storage with high-frequency reads and writes. A simple and unified application interface is provided for business using the storage system, a file caching strategy is applied to improve the resource utilization, and the rich automated tool chain of Kubernetes is adopted to make this storage system easier to deploy, easier to expand, and more stable than other storage systems. Experimental results indicate that the proposed storage system has a certain performance and stability improvement compared with the current mainstream object storage and file systems in the situation of large-scale fragmented data storage with more reads than writes.
刘福鑫, 李劲巍, 王熠弘, 李琳. 基于Kubernetes的云原生海量数据存储系统设计与实现[J]. 计算机应用, 2020, 40(2): 547-552.
LIU Fuxin, LI Jingwei, WANG Yihong, LI Lin. Design and implementation of cloud native massive data storage system based on Kubernetes. Journal of Computer Applications, 2020, 40(2): 547-552.
[1] GEORGE P. Cloud computing:the new frontier of Internet computing[J]. IEEE Internet Computing,2010,14(5):70-73. [2] SPILLNER J,BOGADO Y,BENÍTEZ W,et al. Co-transformation to cloud-native applications:development experiences and experimental evaluation[C]//Proceedings of the 8th International Conference on Cloud Computing and Services Science. Setúbal:SciTePress,2018:596-607. [3] 吴宁川. 那年的"晨冰恋", 竟促成了微博与阿里云史上最大混合云的诞生[EB/OL].[2019-07-01]. https://www.tmtpost.com/2572845.html. (WU N C. That year's "Chen Bing love" actually led to the birth of the largest hybrid cloud in the history of Weibo and Alibaba cloud[EB/OL].[2019-07-01]. https://www.tmtpost.com/2572845.html. [4] AHUJA S,MOORE B. A survey of cloud computing and social networks[J]. Network and Communication Technologies,2013, 2(2):11-16. [5] BEAVER D,KUMAR S,LI H C,et al. Finding a needle in Haystack:Facebook's photo storage[C]//Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementations. Berkeley:USENIX Association,2010:47-60. [6] GIBSON G A,NAGLE D F,AMIRI K,et al. A cost-effective, high-bandwidth storage architecture[C]//Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. New York:ACM, 1998:92-103. [7] LEUNG L. After 10 years,object storage investment continues and begins to bear significant fruit[EB/OL].[2019-07-04]. https://oxygencloudblog.wordpress.com/2013/09/16/after-10-years-objectstorage-investment-continues-and-begins-to-bear-significant-fruit/. [8] BARR J. Amazon S3-two trillion objects,1. 1 million requests/second[EB/OL].[2019-06-27]. https://aws.amazon.com/cn/blogs/aws/amazon-s3-two-trillion-objects-11-million-requests-second/. [9] WEIL S A,BRANDT S A,MILLER E L,et al. Ceph:a scalable, high-performance distributed file system[C]//Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. Berkeley:USENIX Association,2006:307-320. [10] ONGARO D,OUSTERHOUT J K. In search of an understandable consensus algorithm[C]//Proceedings of the 2014 USENIX Annual Technical Conference. Berkeley:USENIX Association,2014:305-319. [11] MINGLANI M,DIEHL J,CAO X,et al. Kinetic action:performance analysis of integrated key-value storage devices vs. levelDB servers[C]//Proceedings of the IEEE 23rd International Conference on Parallel and Distributed Systems. Piscataway:IEEE,2017:501-510.