Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (1): 103-107.DOI: 10.11772/j.issn.1001-9081.2015.01.0103

Previous Articles     Next Articles

HBase-based real-time storage system for traffic stream data

LU Ting, FANG Jun, QIAO Yanke   

  1. Research Center for Cloud Computing, North China University of Technology, Beijing 100041, China
  • Received:2014-07-18 Revised:2014-09-09 Online:2015-01-01 Published:2015-01-26

基于HBase的交通流数据实时存储系统

陆婷, 房俊, 乔彦克   

  1. 北方工业大学 云计算研究中心, 北京100041
  • 通讯作者: 房俊
  • 作者简介:陆婷(1990-),女,山东菏泽人,硕士研究生,主要研究方向:流数据的存储优化;房俊(1976-),男,江苏南京人,副研究员,主要研究方向:服务计算、云计算;乔彦克(1989-),男,河南漯河人,硕士研究生,主要研究方向:云数据处理.
  • 基金资助:

    北京市自然科学基金重点项目(4131001);北京市属高等学校创新团队建设与教师职业发展规划项目(IDHT20130502);北大方正集团有限公司数字出版技术国家重点实验室开放课题;北方工业大学科研启动基金资助项目.

Abstract:

Traffic stream data has characteristics of multi-source, high speed and large volume, etc. When dealing with these data, the traditional methods and systems of data storage have exposed the problems of weak scalability and low real-time storage. To address these problems, this work designed and implemented a HBase-based real-time storage system for traffic streaming data. The system adopted the distributed storage architecture, standardized data through front-end preprocessing, divided different kinds of streaming data into different queues by using multi-source cache structure, and combined the consistent Hash algorithm, multi-thread and row-key optimization strategy to write data into HBase cluster in parallel. The experimental results demonstrate that, compared with the real-time storage system based on Oracle, the storage performance of the system has 3-5 times increment. When compared with the original HBase, it has 2-3 times increment of storage performance and it also has good scalability.

Key words: streaming data, multi-source buffer, data sharding, consistent Hash algorithm, real-time storage, HBase

摘要:

交通流数据具有多来源、高速率、体量大等特征,传统数据存储方法和系统暴露出扩展性弱和存储实时性低等问题.针对上述问题,设计并实现了一套基于HBase交通流数据实时存储系统.该系统采用分布式存储架构,通过前端的预处理操作对数据进行规范化整理,利用多源缓冲区结构对不同类型的流数据进行队列划分,并结合一致性哈希算法、多线程技术、行键优化设计等策略将数据并行存储到HBase集群服务器中.实验结果表明:该系统与基于Oracle的实时存储系统相比,其存储性能提升了3~5倍;与原生的HBase方法相比,其存储性能提升了2~3倍,并且具有良好的扩展性能.

关键词: 流数据, 多源缓冲区, 数据切分, 一致性哈希算法, 实时存储, HBase

CLC Number: