计算机应用 ›› 2014, Vol. 34 ›› Issue (9): 2617-2621.DOI: 10.11772/j.issn.1001-9081.2014.09.2617

• 数据技术 • 上一篇    下一篇

基于HBase的气象地面分钟数据分布式存储系统

陈东辉,曾乐,梁中军,肖卫青   

  1. 国家气象信息中心 系统工程室,北京 100081
  • 收稿日期:2014-03-12 修回日期:2014-04-19 出版日期:2014-09-01 发布日期:2014-09-30
  • 通讯作者: 陈东辉
  • 作者简介: 
    陈东辉(1984-),男,河南南阳人,工程师,博士,主要研究方向:分布式计算、数据库;
    曾乐(1977-),女,湖南长沙人,工程师,博士,主要研究方向:大规模数据分析、分布式数据处理;
    梁中军(1984-),男,新疆喀什人,工程师,博士,主要研究方向:Web数据管理、海量信息处理;
    肖卫青(1984-),男,河北保定人,工程师,硕士,主要研究方向:并行计算、数据库。
  • 基金资助:

    国家气象信息中心青年科技基金资助项目

HBase-based distributed storage system for meteorological gound minute data

CHEN Donghui,ZENG Le,LIANG Zhongjun,XIAO Weiqing   

  1. Engineering System Division, National Meteorological Information Center, Beijing 100081, China
  • Received:2014-03-12 Revised:2014-04-19 Online:2014-09-01 Published:2014-09-30
  • Contact: CHEN Donghui

摘要:

针对气象地面分钟数据要素多样、信息量大、产生频次高等特点,传统的关系型数据库系统在存储和管理数据上出现负载饱满、读写性能不理想等问题。结合对分布式数据库HBase的存储模型的研究,行主键(row key)采用时间加站号的方式设计了气象分钟数据存储结构模型,实现对海量气象数据的分布式存储和元信息管理。对HBase的唯一索引在面对气象业务的复杂查询用例时响应时间过长的问题,使用搜索引擎solr提供的API接口并参考气象业务中的查询用例对相关字段建立辅助索引,来满足业务检索时效。实验结果表明,该系统具有很好的存储能力和检索效率,入库效率最高可达每秒34000条,并且在常规查询用例的结果返回时效达到毫秒级,能够满足大规模气象数据在业务应用中对存储和查询时效的性能要求。

Abstract:

The meteorological ground minute data has characteristics including various elements, large amounts of information and high frequency generation, therefore the traditional relational database system has some problems such as server overload and low read and write performance in data storage and management. With the research of storage model of distributed databases HBase, the database model of the meteorological ground minute data was proposed to achieve distributed storage of massive meteorological data and meta-information management, in which the row key was designed by the method of time plus station number. When processing the complex meteorological query case, the response time of unique index in HBase is too long. To address this defect and meet the requirements of retrieval time efficiency, with considering the query case, API interface offered by search engine solr was used to establish secondary index for related field. The experimental results show that this system has high efficiency of storage and index, the maximum storage efficiency can be up to 34000 records/s. When generic query cases return, the time consuming can be down to millisecond level. This method can satisfy the performance requirements of large-scale meteorological data in business applications.

中图分类号: