Search Result

Select

Visibility forecast model based on LightGBM algorithm

YU Dongchang, ZHAO Wenfang, NIE Kai, ZHANG Ge

Journal of Computer Applications 2021, 41 (4): 1035-1041. DOI: 10.11772/j.issn.1001-9081.2020081589

Abstract （818）

PDF （1107KB）（868）

Save

In order to improve the accuracy of visibility forecast, especially the accuracy of low-visibility forecast, an ensemble learning model based on random forest and LightGBM for visibility forecast was proposed. Firstly, based on the meteorological forecast data of the numerical modeling system, combined with meteorological observation data and PM _2.5 concentration observation data, the random forest method was used to construct the feature vectors. Secondly, for the missing data with different time spans, three missing value processing methods were designed to replace the missing values, and then the data sample set with good continuity for training and testing was created. Finally, a visibility forecast model based on LightGBM was established, and its parameters were optimized by using the network search method. The proposed model was compared to Support Vector Machine(SVM), Multiple Linear Regression(MLR) and Artificial Neural Network(ANN) on performance. Experimental results show that for different levels of visibility, the proposed visibility forecast model based on LightGBM algorithm obtains the highest Threat Score(TS); when the visibility is less than 2 km, the average correlation coefficient between the visibility values of observation stations predicted by the model and the observation values of visibility of observation stations is 0.75, the average mean square error between them is 6.49. It can be seen that the forecast model based on LightGBM can effectively improve the accuracy of visibility forecast.

Reference | Related Articles | Metrics

Select

Forecasting model of pollen concentration based on particle swarm optimization and support vector machine

ZHAO Wenfang, WANG Jingli, SHANG Min, LIU Yanan

Journal of Computer Applications 2019, 39 (1): 98-104. DOI: 10.11772/j.issn.1001-9081.2018071626

Abstract （714）

PDF （1158KB）（438）

Save

To improve the accuracy of pollen concentration forecast and resolve low accuracy of current pollen concentration forecast model, a model for daily pollen concentration forecasting based on Particle Swarm Optimization (PSO) algorithm and Support Vector Machine (SVM) was proposed. Firstly, the feature vector extraction was carried out by using correlation analysis technique to select meteorological data with strong correlation with pollen concentration, such as temperature, daily temperature difference, relative humidity, precipitation, wind, sunshine hours. Secondly, an SVM prediction model based on this vector and pollen concentration observation data was established. The PSO algorithm was designed to optimize the parameters in SVM algorithm, and then the optimal parameters were used to construct daily pollen concentration prediction model. Finally, the forecast of pollen concentration in 24 hours in advance was made by using the optimized SVM model. The comparison among the accuracy of the optimized SVM model, Multiple Linear Regression (MLR) model and Back Propagation Neural Network (BPNN) model was performed to evaluate their performances. In addition, the optimized model was also applied for the forecast of pollen concentration in 24 hours in advance at Nanjiao and Miyun meteorological observation stations. The experimental results show that the proposed method performs better than MLR and BPNN methods. Meanwhile, it also provides promising results for forecast of pollen concentration in 24 hours in advance and also has good generalization ability.

Reference | Related Articles | Metrics

Select

Real-time processing system for automatic weather station data on Spark Streaming architecture

ZHAO Wenfang, LIU Xulin

Journal of Computer Applications 2018, 38 (1): 38-43. DOI: 10.11772/j.issn.1001-9081.2017071903

Abstract （548）

PDF （1144KB）（469）

Save

Aiming at these problems of the current data service of Automatic Weather Stations (AWS), including data processing delay, slow interactive response, and low statistical efficiency, a new method based on Spark Streaming and HBase technologies was proposed and introduced to process massive streaming AWS data by integrating stream computing framework and distributed database system. Flume was used for data collection, and data processing was conducted by Spark Streaming and data were stored into HBase. In framework of Spark, two algorithms, one for writing streaming AWS data into HBase database, the other for realizing real-time statistical calculation of different observed AWS meteorological elements were designed. Finally, a stable and high-efficient system for real-time acquisition, processing, and statistics of AWS data was developed on Cloudera platform. Based on comparative analysis and running monitoring, performances of the system were confirmed, including low latency, high I/O efficiency, stable running status and excellent load balance. The experimental results show that the response time of Spark Streaming-based real-time operational processing for AWS data can reach to millisecond level, which includes paralleled data writing into HBase, HBase-based data query and statistics on different meteorological elements. The system can fully meet needs of operational applications to AWS data, and provides effective support to weather forecast.

Reference | Related Articles | Metrics