Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 547-552.DOI: 10.11772/j.issn.1001-9081.2019101732

• CCF Bigdata 2019 • Previous Articles     Next Articles

Design and implementation of cloud native massive data storage system based on Kubernetes

Fuxin LIU(), Jingwei LI, Yihong WANG, Lin LI   

  1. School of Computer Science and Technology,Wuhan University of Technology,Wuhan Hubei 430070,China
  • Received:2019-08-30 Revised:2019-10-14 Accepted:2019-10-18 Online:2019-11-18 Published:2020-02-10
  • Contact: Fuxin LIU
  • About author:LI Jingwei, born in 1999. His research interests include machine learning, data mining.
    WANG Yihong, born in 1999. His research interests include algorithm optimization and its applications.
    LI Lin, born in 1977. Ph. D., professor. Her research interests include information retrieval, recommendation system.
  • Supported by:
    the National Innovation and Entrepreneurship Training Program for Undergraduates(20181049710013)

基于Kubernetes的云原生海量数据存储系统设计与实现

刘福鑫(), 李劲巍, 王熠弘, 李琳   

  1. 武汉理工大学 计算机科学与技术学院,武汉 430070
  • 通讯作者: 刘福鑫
  • 作者简介:李劲巍(1999—),男,浙江松阳人,主要研究方向:机器学习、数据挖掘
    王熠弘(1999—),男,山东滨州人,主要研究方向:计算机算法优化及其应用
    李琳(1977—),女,湖南衡阳人,教授,博士,CCF会员,主要研究方向:信息检索、推荐系统。
  • 基金资助:
    国家大学生创新创业训练计划项目(20181049710013)

Abstract:

Aiming at the sharp increasing of data on the cloud caused by the development and popularization of cloud native technology as well as the bottlenecks of the technology in performance and stability, a Haystack-based storage system was proposed. With the optimization in service discovery, automatic fault tolerance and caching mechanism, the system is more suitable for cloud native business and meets the growing and high-frequent file storage and read/write requirements of the data acquisition, storage and analysis industries. The object storage model used by the system satisfies the massive file storage with high-frequency reads and writes. A simple and unified application interface is provided for business using the storage system, a file caching strategy is applied to improve the resource utilization, and the rich automated tool chain of Kubernetes is adopted to make this storage system easier to deploy, easier to expand, and more stable than other storage systems. Experimental results indicate that the proposed storage system has a certain performance and stability improvement compared with the current mainstream object storage and file systems in the situation of large-scale fragmented data storage with more reads than writes.

Key words: file system, object storage, cloud computing, container orchestration, cloud native business

摘要:

为应对云原生技术的日益发展与普及伴随的云上数据量的激增及该技术在性能与稳定性等方面所出现的瓶颈,提出了一种基于Haystack的存储系统。该存储系统在服务发现、自动容错与缓存方面进行了优化,更适用于云原生业务,以满足数据采集、存储与分析行业不断增长且频次较高的文件存储与读写需求。该存储系统使用对象存储模型来满足高频海量的文件存储,为使用该存储系统的业务提供简单而统一的应用程序接口,应用了文件缓存策略提升资源利用率,同时利用Kubernetes丰富的自动化工具链使该存储系统比其他存储系统更容易部署和扩展且更稳定。实验结果表明,该存储系统在读多于写的大规模碎片数据存储情境下相比目前主流的对象存储与文件系统均有一定的性能与稳定性提升。

关键词: 文件系统, 对象存储, 云计算, 容器编排, 云原生业务

CLC Number: