计算机应用 ›› 2018, Vol. 38 ›› Issue (5): 1339-1345.DOI: 10.11772/j.issn.1001-9081.2017102475

• 数据科学与技术 • 上一篇    下一篇

基于HBase和Hive的航班延误平台的存储方法

吴仁彪, 刘超, 屈景怡   

  1. 中国民航大学 天津市智能信号与图像处理重点实验室, 天津 300300
  • 收稿日期:2017-10-19 修回日期:2017-12-29 出版日期:2018-05-10 发布日期:2018-05-24
  • 通讯作者: 屈景怡
  • 作者简介:吴仁彪(1966-),男,湖北武汉人,教授,博士生导师,博士,主要研究方向:自适应信号处理、现代谱分析及其在雷达、卫星导航和空管中的应用;刘超(1991-),男,湖北黄冈人,硕士研究生,主要研究方向:航空运输大数据;屈景怡(1978-),女,天津人,副教授,博士,主要研究方向:航空运输大数据、神经网络。
  • 基金资助:
    国家自然科学基金资助项目(11402294);天津市智能信号与图像处理重点实验室开放基金资助项目(2017ASP-TJ01)。

Storage method for flight delay platform based on HBase and Hive

WU Renbiao, LIU Chao, QU Jingyi   

  1. Tianjin Key Laboratory for Advanced Signal Processing, Civil Aviation University of China, Tianjin 300300, China
  • Received:2017-10-19 Revised:2017-12-29 Online:2018-05-10 Published:2018-05-24
  • Contact: 屈景怡
  • Supported by:
    This work is partially supported by National Natural Science Foundation of China (11402294), the Tianjin Intelligent Signal and Image Processing Key Laboratory Open Fund Project (2017ASP-TJ01).

摘要: 针对我国目前航班延误平台的移植难、可扩展性差,无法适应民航高速发展所带来的大数据量存储的现状,设计了面向大数据的跨平台、高适用性与高扩展性的航班延误平台。该平台以大数据工具LeafLet为可视化载体,在地图界面实时显示航班轨迹并将轨迹数据加载至HBase数据库中,并且利用信息摘要算法(MD5)重新设计与优化航班数据表的行键,以解决其递增的飞行时间特性产生的"热点"问题;针对HBase过滤器多级查询的缺陷,提出了基于SolrCloud的关联查询算法,利用SolrCloud实现对行键与索引字段的分层存储,从而实现HBase二级快速索引;最后在HBase的历史航班数据与飞行计划数据基础上,构建基于Hive的海量航班信息数据仓库。实验结果显示,航班延误大数据平台的可扩展性与搭建的航班信息数据仓库可以满足民航对数据集中统一存储的需求,而多条件查询的响应速度与无二级索引的集群相比提高了上百倍,并且这种优势随着航班数据量的增长愈发明显。

关键词: 大数据平台, 航班延误, HBase, Hive, SolrCloud, LeafLet

Abstract: In the view of the problem that the portability and expansibility current flight delay platform in China can not adapt to the status of large data storage brought by rapid development of Chinese civil aviation, a flight delay big data platform with cross platform, high availability and high expansion was designed. The platform used a big data tool LeafLet as a visual carrier, displayed the flight trajectory in the map interface, and loaded trajectory data to HBase database. Message-Digest Algorithm (MD5) algorithm was used to redesign and optimize the rowkey of flight data table to solve its "hot spot" problem brought by incremental flight time. Considering the shortcomings of multi-level query of HBase filter, a query algorithm based on SolrCloud was proposed, which utilized SolrCloud to realize hierarchical storage of row and index fields, so as to realize HBase two-level fast indexing. Finally, based on historical flight data and flight plan data of HBase, a massive flight information data warehouse based on Hive was constructed. The experimental results show that the expensibility of large data platform for flight delays and the construction of flight information data warehouse can meet the demand of civil aviation for unified storage of data, and the response speed of the multi-condition query is improved by hundreds of times compared with the cluster without second index, and this advantage becomes more and more obvious as the flight data amount grows.

Key words: big data platform, flight delay, HBase, Hive, SolrCloud, LeafLet

中图分类号: