计算机应用 ›› 2016, Vol. 36 ›› Issue (3): 670-674.DOI: 10.11772/j.issn.1001-9081.2016.03.670

• 大数据 • 上一篇    下一篇

面向海量非结构化数据的非关系型存储管理机制

刘超1,2, 胡成玉2, 姚宏2, 梁庆中2, 颜雪松2   

  1. 1. 华中科技大学 服务计算技术与系统教育部重点实验室, 武汉 430074;
    2. 中国地质大学(武汉) 智能地学信息处理湖北省重点实验室, 武汉 430074
  • 收稿日期:2015-08-31 修回日期:2015-10-01 出版日期:2016-03-10 发布日期:2016-03-17
  • 通讯作者: 胡成玉
  • 作者简介:刘超(1979-),男,湖北武汉人,讲师,博士研究生,CCF会员,主要研究方向:云计算、分布式系统;胡成玉(1978-),男,湖北襄阳人,副教授,博士,主要研究方向:云计算、水管网优化;姚宏(1976-),男,河南许昌人,副教授,博士,主要研究方向:移动计算、物联网;梁庆中(1979-),男,广西桂林人,讲师,博士,主要研究方向:移动互联网与优化;颜雪松(1977-),男,江西吉安人,副教授,博士,主要研究方向:水管网优化、演化计算。
  • 基金资助:
    国家自然科学基金资助项目(61305087,61272470,61440060,61501412);湖北省自然科学基金重点项目(2015CFA065);中国博士后科学基金资助项目(2014M562086);中央高校基本科研业务费专项资金资助项目(CUGL130233)。

Non-relational data storage management mechanism for massive unstructured data

LIU Chao1,2, HU Chengyu2, YAO Hong2, LIANG Qingzhong2, YAN Xuesong2   

  1. 1. Key Laboratory of Services Computing Technology and System of Ministry of Education, Huazhong University of Science and Technology, Wuhan Hubei 430074, China;
    2. Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan Hubei 430074, China
  • Received:2015-08-31 Revised:2015-10-01 Online:2016-03-10 Published:2016-03-17
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61305087, 61272470, 61440060, 61501412), the Provincial Natural Science Foundation of Hubei (2015CFA065), the China Postdoctoral Science Foundation (2014M562086) and the Fundamental Research Funds for the Central Universities (CUGL130233).

摘要: 针对传统的关系数据存储系统性能不足、容错性差,无法适应海量非结构化数据管理的问题,提出一种高性能、高可用非关系型存储管理机制。首先,设计了良好的用户访问服务接口,通过高效的一致性哈希算法支持数据分发到多个存储节点;其次,采用可配置的数据副本机制改善存储系统的可用性;最后,提出查询故障处理机制,用以提升存储系统的容错性,避免节点失效导致服务中断问题。实验结果表明,在不同规模用户负载下,新的存储系统的并发访问请求能力和传统的文件系统、关系数据库相比,分别提升了30%和50%;同时,在合理响应时间内,故障状态下的存储系统的可用性损失小于14%。因此,该机制适用于海量非结构化数据的高效存储管理。

关键词: 非结构化数据, 海量数据存储, 非关系型存储管理, 一致性哈希, 故障处理

Abstract: Traditional relational data storage systems have been criticized by poor performance and lacking of fault tolerance, therefore it cannot satisfy the efficiency requirement of the massive unstructured data management. A non-relational storage management mechanism with high-performance and high-availability was proposed. First, a user-friendly application interface was designed, and data could be distributed to multiple storage nodes through efficient consistent hashing algorithm. Second, a configurable data replication mechanism was presented to enhance availability of the storage system. Finally, a query fault handling mechanism was proposed to improve the storage system's fault-tolerance and avoid service outages, which were caused by the node failure. The experimental results show that the concurrent access capacity of the proposed storage system increases by 30% and 50% respectively compared to traditional file system and relational database under different user workloads; meanwhile, the availability loss of the storage system under the fault state is less than 14% in a reasonable response time. Therefore, it is applicable for efficient storage management of massive unstructured data.

Key words: unstructured data, massive data storage, non-relational storage management, consistent hashing, fault handling

中图分类号: