Design and implementation of cloud native massive data storage system based on Kubernetes

doi:10.11772/j.issn.1001-9081.2019101732

Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 547-552.DOI: 10.11772/j.issn.1001-9081.2019101732

• CCF Bigdata 2019 • Previous Articles Next Articles

Design and implementation of cloud native massive data storage system based on Kubernetes

Fuxin LIU(), Jingwei LI, Yihong WANG, Lin LI

School of Computer Science and Technology，Wuhan University of Technology，Wuhan Hubei 430070，China

Received:2019-08-30 Revised:2019-10-14 Accepted:2019-10-18 Online:2019-11-18 Published:2020-02-10
Contact: Fuxin LIU
About author:LI Jingwei， born in 1999. His research interests include machine learning， data mining.
WANG Yihong， born in 1999. His research interests include algorithm optimization and its applications.
LI Lin， born in 1977. Ph. D.， professor. Her research interests include information retrieval， recommendation system.
Supported by:
the National Innovation and Entrepreneurship Training Program for Undergraduates(20181049710013)

基于Kubernetes的云原生海量数据存储系统设计与实现

刘福鑫(), 李劲巍, 王熠弘, 李琳

武汉理工大学计算机科学与技术学院，武汉 430070

通讯作者: 刘福鑫
作者简介:李劲巍（1999—），男，浙江松阳人，主要研究方向：机器学习、数据挖掘
王熠弘（1999—），男，山东滨州人，主要研究方向：计算机算法优化及其应用
李琳（1977—），女，湖南衡阳人，教授，博士，CCF会员，主要研究方向：信息检索、推荐系统。
基金资助:
国家大学生创新创业训练计划项目(20181049710013)

Abstract

Abstract:

Aiming at the sharp increasing of data on the cloud caused by the development and popularization of cloud native technology as well as the bottlenecks of the technology in performance and stability， a Haystack-based storage system was proposed. With the optimization in service discovery， automatic fault tolerance and caching mechanism， the system is more suitable for cloud native business and meets the growing and high-frequent file storage and read/write requirements of the data acquisition， storage and analysis industries. The object storage model used by the system satisfies the massive file storage with high-frequency reads and writes. A simple and unified application interface is provided for business using the storage system， a file caching strategy is applied to improve the resource utilization， and the rich automated tool chain of Kubernetes is adopted to make this storage system easier to deploy， easier to expand， and more stable than other storage systems. Experimental results indicate that the proposed storage system has a certain performance and stability improvement compared with the current mainstream object storage and file systems in the situation of large-scale fragmented data storage with more reads than writes.

Key words: file system, object storage, cloud computing, container orchestration, cloud native business

摘要：

为应对云原生技术的日益发展与普及伴随的云上数据量的激增及该技术在性能与稳定性等方面所出现的瓶颈，提出了一种基于Haystack的存储系统。该存储系统在服务发现、自动容错与缓存方面进行了优化，更适用于云原生业务，以满足数据采集、存储与分析行业不断增长且频次较高的文件存储与读写需求。该存储系统使用对象存储模型来满足高频海量的文件存储，为使用该存储系统的业务提供简单而统一的应用程序接口，应用了文件缓存策略提升资源利用率，同时利用Kubernetes丰富的自动化工具链使该存储系统比其他存储系统更容易部署和扩展且更稳定。实验结果表明，该存储系统在读多于写的大规模碎片数据存储情境下相比目前主流的对象存储与文件系统均有一定的性能与稳定性提升。

关键词: 文件系统, 对象存储, 云计算, 容器编排, 云原生业务

CLC Number:

TP392

Fuxin LIU, Jingwei LI, Yihong WANG, Lin LI. Design and implementation of cloud native massive data storage system based on Kubernetes[J]. Journal of Computer Applications, 2020, 40(2): 547-552.

刘福鑫, 李劲巍, 王熠弘, 李琳. 基于Kubernetes的云原生海量数据存储系统设计与实现[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 547-552.

Figures/Tables 4

References 11

1	GEORGE P. Cloud computing： the new frontier of Internet computing［J］. IEEE Internet Computing， 2010， 14（5）： 70-73. 10.1109/mic.2010.113
2	SPILLNER J， BOGADO Y， BENÍTEZ W， et al. Co-transformation to cloud-native applications： development experiences and experimental evaluation［C］// Proceedings of the 8th International Conference on Cloud Computing and Services Science. Setúbal： SciTePress， 2018： 596-607. 10.5220/0006790305960607
3	吴宁川. 那年的“晨冰恋”，竟促成了微博与阿里云史上最大混合云的诞生［EB/OL］. ［2019-07-01］. ’s "Chen Bing love" actually led to the birth of the largest hybrid cloud in the history of Weibo and Alibaba cloud［EB/OL］. ［2019-07-01］. .
4	AHUJA S， MOORE B. A survey of cloud computing and social networks［J］. Network and Communication Technologies， 2013， 2（2）： 11-16. 10.5539/nct.v2n2p11
5	BEAVER D， KUMAR S， LI H C， et al. Finding a needle in Haystack： Facebook’s photo storage［C］// Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementations. Berkeley： USENIX Association， 2010： 47-60.
6	GIBSON G A， NAGLE D F， AMIRI K， et al. A cost-effective， high-bandwidth storage architecture［C］// Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. New York： ACM， 1998： 92-103. 10.1145/291069.291029
7	LEUNG L. After 10 years， object storage investment continues and begins to bear significant fruit［EB/OL］. ［2019-07-04］. .
8	BARR J. Amazon S3— two trillion objects， 1.1 million requests/second［EB/OL］. ［2019-06-27］. .
9	WEIL S A， BRANDT S A， MILLER E L， et al. Ceph： a scalable， high-performance distributed file system［C］// Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. Berkeley： USENIX Association， 2006： 307-320.
10	ONGARO D， OUSTERHOUT J K. In search of an understandable consensus algorithm［C］// Proceedings of the 2014 USENIX Annual Technical Conference. Berkeley： USENIX Association， 2014： 305-319.
11	MINGLANI M， DIEHL J， CAO X， et al. Kinetic action： performance analysis of integrated key-value storage devices vs. levelDB servers［C］// Proceedings of the IEEE 23rd International Conference on Parallel and Distributed Systems. Piscataway： IEEE， 2017： 501-510. 10.1109/icpads.2017.00072

存储方案	文件大小/KB	线程数为50				线程数为500				线程数为1 000				线程数为5 000
		读取时间/ms		写入时间/ms		读取时间/ms		写入时间/ms		读取时间/ms		写入时间/ms		读取时间/ms		写入时间/ms
		均值	方差	均值	方差	均值	方差	均值	方差	均值	方差	均值	方差	均值	方差	均值	方差
EXT4	4	0.4	0.16	0.6	0.53	1.5	0.76	2.6	1.56	7.9	3.12	12.5	6.32	80.5	29.1	135.2	47.1
	40	1.2	0.48	1.1	0.51	3.4	0.87	5.8	3.51	10.2	4.40	15.8	16.63	102.1	34.5	170.4	59.3
	400	2.2	0.54	2.4	0.97	5.2	0.35	10.1	5.35	14.9	9.20	20.3	18.25	133.4	48.2	221.4	62.4
ZFS	4	2.2	1.13	2.2	0.57	4.8	1.92	4.3	2.21	12.2	4.21	14.2	11.16	87.4	56.4	155.4	69.9
	40	2.7	2.21	4.1	1.76	5.3	2.42	10.2	8.34	22.4	14.10	22.5	20.41	114.4	63.1	212.2	75.8
	400	4.1	2.97	13.3	2.31	9.3	2.93	25.9	13.54	30.1	16.44	55.1	41.25	194.9	85.8	255.7	96.2
Kubestorage	4	0.6	0.49	3.4	1.45	1.7	0.31	18.2	4.50	7.1	2.66	30.4	10.23	36.3	4.5	144.2	67.1
	40	1.3	0.66	19.4	2.40	3.7	0.42	36.1	18.40	10.1	3.58	72.5	15.25	40.1	10.1	245.4	76.4
	400	2.2	0.67	26.2	3.48	6.6	1.12	57.1	20.90	14.7	8.77	96.3	25.22	72.2	32.1	422.3	102.4
SeaweedFS	4	0.7	0.62	4.1	1.49	1.9	1.01	20.2	3.60	7.3	3.87	38.1	8.97	45.1	32.3	133.2	63.5
	40	1.2	0.61	19.1	4.11	3.9	1.02	36.5	20.60	13.1	5.71	69.2	15.12	65.2	41.4	200.1	100.3
	400	2.2	0.85	28.8	7.40	8.8	1.26	50.2	24.20	16.3	10.12	89.5	19.24	144.0	92.3	381.4	110.4

存储方案	文件大小/KB	线程数为50				线程数为500				线程数为1 000				线程数为5 000
		读取时间/ms		写入时间/ms		读取时间/ms		写入时间/ms		读取时间/ms		写入时间/ms		读取时间/ms		写入时间/ms
		均值	方差	均值	方差	均值	方差	均值	方差	均值	方差	均值	方差	均值	方差	均值	方差
EXT4	4	0.4	0.16	0.6	0.53	1.5	0.76	2.6	1.56	7.9	3.12	12.5	6.32	80.5	29.1	135.2	47.1
	40	1.2	0.48	1.1	0.51	3.4	0.87	5.8	3.51	10.2	4.40	15.8	16.63	102.1	34.5	170.4	59.3
	400	2.2	0.54	2.4	0.97	5.2	0.35	10.1	5.35	14.9	9.20	20.3	18.25	133.4	48.2	221.4	62.4
ZFS	4	2.2	1.13	2.2	0.57	4.8	1.92	4.3	2.21	12.2	4.21	14.2	11.16	87.4	56.4	155.4	69.9
	40	2.7	2.21	4.1	1.76	5.3	2.42	10.2	8.34	22.4	14.10	22.5	20.41	114.4	63.1	212.2	75.8
	400	4.1	2.97	13.3	2.31	9.3	2.93	25.9	13.54	30.1	16.44	55.1	41.25	194.9	85.8	255.7	96.2
Kubestorage	4	0.6	0.49	3.4	1.45	1.7	0.31	18.2	4.50	7.1	2.66	30.4	10.23	36.3	4.5	144.2	67.1
	40	1.3	0.66	19.4	2.40	3.7	0.42	36.1	18.40	10.1	3.58	72.5	15.25	40.1	10.1	245.4	76.4
	400	2.2	0.67	26.2	3.48	6.6	1.12	57.1	20.90	14.7	8.77	96.3	25.22	72.2	32.1	422.3	102.4
SeaweedFS	4	0.7	0.62	4.1	1.49	1.9	1.01	20.2	3.60	7.3	3.87	38.1	8.97	45.1	32.3	133.2	63.5
	40	1.2	0.61	19.1	4.11	3.9	1.02	36.5	20.60	13.1	5.71	69.2	15.12	65.2	41.4	200.1	100.3
	400	2.2	0.85	28.8	7.40	8.8	1.26	50.2	24.20	16.3	10.12	89.5	19.24	144.0	92.3	381.4	110.4

存储方案	阶段（访问次数）为100		阶段（访问次数）为500		阶段（访问次数）为1 000		阶段（访问次数）为5 000		阶段（访问次数）为10 000
存储方案	平均响应时间	方差	平均响应时间	方差	平均响应时间	方差	平均响应时间	方差	平均响应时间	方差
EXT4	10.1	4.60	9.30	4.08	9.10	4.11	9.30	4.01	9.30	4.32
ZFS	22.4	12.42	22.70	12.01	20.40	12.23	22.30	13.01	22.00	12.34
Kubestorage	13.1	4.65	7.34^*	3.01	4.24^**	1.11	4.14	1.20	4.41	1.12
SeaweedFS	13.3	6.01	12.80	6.11	12.50	7.01	13.80	6.57	13.20	7.01

存储方案	阶段（访问次数）为100		阶段（访问次数）为500		阶段（访问次数）为1 000		阶段（访问次数）为5 000		阶段（访问次数）为10 000
存储方案	平均响应时间	方差	平均响应时间	方差	平均响应时间	方差	平均响应时间	方差	平均响应时间	方差
EXT4	10.1	4.60	9.30	4.08	9.10	4.11	9.30	4.01	9.30	4.32
ZFS	22.4	12.42	22.70	12.01	20.40	12.23	22.30	13.01	22.00	12.34
Kubestorage	13.1	4.65	7.34^*	3.01	4.24^**	1.11	4.14	1.20	4.41	1.12
SeaweedFS	13.3	6.01	12.80	6.11	12.50	7.01	13.80	6.57	13.20	7.01

[1]	Xu LI, Yulin HE, Laizhong CUI, Zhexue HUANG, Fournier‑Viger PHILIPPE. Distributed observation point classifier for big data with random sample partition [J]. Journal of Computer Applications, 2024, 44(6): 1727-1733.
[2]	Meihong CHEN, Lingyun YUAN, Tong XIA. Data classified and graded access control model based on master-slave multi-chain [J]. Journal of Computer Applications, 2024, 44(4): 1148-1157.
[3]	Ruixuan NI, Miao CAI, Baoliu YE. DFS-Cache： memory-efficient and persistent client cache for distributed file systems [J]. Journal of Computer Applications, 2024, 44(4): 1172-1180.
[4]	Xiaoyu DU, Shuaiqi LIU, Zhijie HAN, Zhenxiang HUO, Yujing WANG. Patient-centric medical information sharing scheme based on IPFS and blockchain [J]. Journal of Computer Applications, 2024, 44(10): 3122-3133.
[5]	Jiaxing LU, Hua DAI, Yuanlong LIU, Qian ZHOU, Geng YANG. Dictionary partition vector space model for ciphertext ranked search in cloud environment [J]. Journal of Computer Applications, 2023, 43(7): 1994-2000.
[6]	Kun YOU, Qinhui WANG, Xin LI. General multi-unit false-name-proof auction mechanism for cloud computing [J]. Journal of Computer Applications, 2023, 43(11): 3351-3357.
[7]	Jingyu SUN, Jiayu ZHU, Ziqiang TIAN, Guozhen SHI, Chuanjiang GUAN. Attribute based encryption scheme based on elliptic curve cryptography and supporting revocation [J]. Journal of Computer Applications, 2022, 42(7): 2094-2103.
[8]	Jinquan ZHANG, Shouwei XU, Xincheng LI, Chongyang WANG, Jingzhi XU. Cloud computing task scheduling based on orthogonal adaptive whale optimization [J]. Journal of Computer Applications, 2022, 42(5): 1516-1523.
[9]	CHEN Jiahao, YIN Xinchun. Traceable and revocable ciphertext-policy attribute-based encryption scheme based on cloud-fog computing [J]. Journal of Computer Applications, 2021, 41(6): 1611-1620.
[10]	GE Lina, HU Yugu, ZHANG Guifen, CHEN Yuanyuan. Reverse hybrid access control scheme based on object attribute matching in cloud computing environment [J]. Journal of Computer Applications, 2021, 41(6): 1604-1610.
[11]	YANG Ling, JIANG Chunmao. Strategy of energy-aware virtual machine migration based on three-way decision [J]. Journal of Computer Applications, 2021, 41(4): 990-998.
[12]	ZHANG Guochao, TANG Huayun, CHEN Jianhai, SHEN Rui, HE Qinming, HUANG Butian. Digital music copyright management system based on blockchain [J]. Journal of Computer Applications, 2021, 41(4): 945-955.
[13]	Xiaoling SUN, Guang YANG, Yanping SHEN, Qiuge YANG, Tao CHEN. Searchable encryption scheme based on splittable inverted index [J]. Journal of Computer Applications, 2021, 41(11): 3288-3294.
[14]	DONG Haoyu, CHEN Kang. RUFS: a pure userspace network file system [J]. Journal of Computer Applications, 2020, 40(9): 2577-2585.
[15]	GOU Zi'an, ZHANG Xiao, WU Dongnan, WANG Yanqiu. Log analysis and workload characteristic extraction in distributed storage system [J]. Journal of Computer Applications, 2020, 40(9): 2586-2593.

Design and implementation of cloud native massive data storage system based on Kubernetes

基于Kubernetes的云原生海量数据存储系统设计与实现

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 4

References 11

Related Articles 15

Recommended Articles

Metrics

存储方案	文件大小/KB	读取时间占比/%					写入时间占比/%
存储方案	文件大小/KB	≤ 0.5 ms	≤ 1 ms	≤ 5 ms	≤ 10 ms	≤ 15 ms	≤ 5 ms	≤ 10 ms	≤ 20 ms	≤ 50 ms	≤ 100 ms
EXT4	4	22	36	44	100	100	37	67	100	100	100
	40	10	28	32	65	100	22	54	100	100	100
	400	5	13	44	67	92	10	60	79	100	100
ZFS	4	14	41	55	65	94	10	49	99	100	100
	40	2	13	29	45	61	8	22	49	97	100
	400	1	3	14	17	23	4	18	29	64	99
Kubestorage	4	40	68	82	100	100	18	27	39	86	100
	40	15	36	72	99	100	10	22	44	52	100
	400	7	20	54	75	100	7	22	36	45	96
SeaweedFS	4	33	67	76	100	100	19	30	44	92	100
	40	6	30	44	57	100	16	28	40	50	99
	400	6	20	37	56	77	11	25	41	47	99