• •    

DPCS2017+75+Spark Streaming框架下的气象自动站数据实时处理系统

赵文芳,刘旭林   

  1. 北京市气象局
  • 收稿日期:2017-08-02 修回日期:2017-08-04 发布日期:2017-08-04
  • 通讯作者: 赵文芳

DPCS2017+75+Realtime process system of meteorological automate station data on Spark Streaming architecture

  • Received:2017-08-02 Revised:2017-08-04 Online:2017-08-04
  • Contact: Wen FangZHAO

摘要: 摘 要: 针对现有气象自动站业务平台面临处理数据不及时、交互式响应慢、统计时效差等问题,提出了使用Spark Streaming技术和HBase解决该问题的方法,将实时计算框架和分布式数据库系统结合起来实现大规模流式数据处理。使用Flume收集自动站数据,Spark Streaming对数据进行流式处理并存储到HBase数据库中,并设计Spark框架下的自动站数据流式入库处理算法和要素极值的实时统计算法,在Cloudera平台下实现了一个高速可靠的实时采集、处理、统计的应用系统。通过对比分析和性能监测,验证了该系统具有低延迟和高吞吐量的优势,运行状况良好,负载均衡。实验结果表明,Spark Streaming用于气象自动站的实时业务处理,数据并行写入HBase、基于HBase的查询和各类要素统计均能达到毫秒级响应,完全能满足自动站数据的应用需求,有效的支撑天气预报业务。

关键词: 关键词: 气象自动站, spark streaming, 流计算, 数据处理, flume

Abstract: Abstract: Aiming at these problems of the current data service of Automatic Weather Stations (AWS), including data processing delay, slow interactive response, and low statistical efficiency, a new method based on Spark Streaming and HBase technologies was proposed and introduced for processing massive streaming AWS data by integrating stream computing framework and distributed database system. The method uses Flume for data collection, Spark Streaming for data processing and data storage into HBase. In framework of Spark, two algorithms, one for writing streaming AWS data into HBase database, the other for realizing real-time statistical calculation of different observed AWS meteorological elements were designed. Finally, a stable and high-efficiency system for real-time acquisition, processing, and statistics of AWS data is developed on Cloudera platform. Based on comparative analysis and running monitoring, robust performances of the system have been confirmed, including low latency time, high I/O efficiency, stable running status, and excellent load balance. The test results show that response time of Spark Streaming-based real-time operational processing of AWS data can reach to millisecond level, which includes paralleled data writing into HBase, HBase-based data query and statistics on different meteorological elements. The system can fully meet needs of operational applications on AWS data, and provide effective support to weather forecast.

Key words: Keywords: automated weather station, spark streaming, streaming computer, data processing, flume

中图分类号: