Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (5): 1394-1400.DOI: 10.11772/j.issn.1001-9081.2022030437

• China Conference on Data Mining 2022 (CCDM 2022) • Previous Articles    

Extraction of PM2.5 diffusion characteristics based on candlestick pattern matching

Rui XU1, Shuang LIANG1, Hang WAN2(), Yimin WEN1, Shiming SHEN3, Jian LI1   

  1. 1.College of Computer and Information Security,Guilin University of Electronic Technology,Guilin Guangxi 541004,China
    2.Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou),Guangzhou Guangdong 511458,China
    3.Satellite Navigation Positioning and Location Service National and Local Joint Engineering Research Center (Guilin University of Electronic Technology),Guilin Guangxi 541004,China
  • Received:2022-04-06 Revised:2022-06-02 Accepted:2022-06-15 Online:2023-05-08 Published:2023-05-10
  • Contact: Hang WAN
  • About author:LIANG Shuang, born in 1994, M. S. candidate. Her research interests include environmental forecasting, deep learning and environmental big data.
    WAN Hang, born in 1989, Ph. D., assistant research fellow. His research interests include deep learning and environmental big data.
    WEN Yimin, born in 1969, Ph. D., professor. His research interests include machine learning and data mining, recommender system, computer vision, big data analysis, artificial intelligence security, media analysis.
    SHEN Shiming, born in 1985, M. S., assistant research fellow. Her research interests include environmental big data analysis.
    LI Jian, born in 1991, M. S., assistant research fellow. His research interests include deep learning and environmental remote sensing.
  • Supported by:
    Guangxi Natural Science Foundation(2021JJA170096);Guangxi Key Research and Development Program(AB21196063);Major Achievement Transformation Foundation of Guilin(20192013-1);Innovation and Entrepreneurship Training Program for College Students of Guilin University of Electronic(202010595031)

基于烛台图模式匹配的PM2.5扩散特征的提取

许睿1, 梁爽1, 万航2(), 文益民1, 沈世铭3, 李建1   

  1. 1.桂林电子科技大学 计算机与信息安全学院,广西 桂林 541004
    2.南方海洋科学与工程广东省实验室(广州),广州 511458
    3.卫星导航定位与位置服务国家地方联合工程研究中心(桂林电子科技大学),广西 桂林 541004
  • 通讯作者: 万航
  • 作者简介:许睿(1977—),男,四川成都人,副教授,博士,CCF会员,主要研究方向:人工智能、深度学习与环境大数据、环境监测仪器仪表、环境遥感与地理信息系统
    梁爽(1994—),女,天津人,硕士研究生,CCF会员,主要研究方向:环境预测、深度学习与环境大数据
    万航(1989—),男,广东广州人,助理研究员,博士,CCF会员,主要研究方向:深度学习与环境大数据 wanhang@gmlab.ac.cn
    文益民(1969—),男,湖南益阳人,教授,博士,CCF会员,主要研究方向:机器学习与数据挖掘、推荐系统、计算机视觉、大数据分析、人工智能安全、媒体分析
    沈世铭(1985—),男,广西桂林人,助理研究员,硕士,CCF会员,主要研究方向:环境大数据分析
    李建(1991—),男,山西长治人,助理研究员,硕士,CCF会员,主要研究方向:深度学习与环境遥感。
  • 基金资助:
    广西自然科学基金资助项目(2021JJA170096);广西重点研发计划项目(AB21196063);桂林市重大成果转化基金资助项目(20192013?1);桂林电子科技大学大学生创新创业训练计划项目(202010595031)

Abstract:

Most existing air quality prediction methods focus on simple time series data for trend prediction, and ignore the pollutant transport and diffusion laws and corresponding classified pattern features. In order to solve the above problem, a PM2.5 diffusion characteristic extraction method based on Candlestick Pattern Matching (CPM) was proposed. Firstly, the basic periodic candlestick charts from a large number of historical PM2.5 sequences were generated by using the convolution idea of Convolutional Neural Network (CNN). Then, the concentration patterns of different candlestick chart feature vectors were clustered and analyzed by using the distance formula. Finally, combining the unique advantages of CNN in image recognition, a hybrid model integrating graphical features and time series features sequences was formed, and the trend reversal that would be caused by candlestick charts with reversal signals was judged. Experimental results on the monitoring time series dataset of Guilin air quality online monitoring stations show that compared with the VGG (Visual Geometry Group)-based method which uses the single time series data, the accuracy of the CPM-based method is improved by 1.9 percentage points. It can be seen that the CPM-based method can effectively extract the trend features of PM2.5 and be used for predicting the periodic change of pollutant concentration in the future.

Key words: air pollution phenomenon, candlestick chart theory, pattern matching, Convolutional Neural Network (CNN), PM2.5

摘要:

现有大气质量预测方法多基于单纯的时间序列数据进行趋势预测,忽略了污染物传输和扩散规律及其分类间模式特征的问题。为此,提出一种基于烛台图模式匹配(CPM)的PM2.5(大气细颗粒物污染)扩散特征提取方法。首先,利用基于卷积神经网络(CNN)的卷积思想从大量历史PM2.5序列中生成基础周期烛台图;然后,通过距离公式对不同烛台图特征向量的浓度模式进行聚类分析;最后,结合CNN在图像识别中的独特优势,形成融合图形特征与时序特征序列的混合模型,判断带有反转信号的烛台图将导致的趋势反转情况。在桂林市大气质量在线监测站的监测时序数据集上的实验结果表明,与使用单一时间序列数据的深度卷积神经网络VGG(Visual Geometry Group)相比,基于CPM的提取方法准确率提升了1.9个百分点。可见,基于CPM的方法能有效提取PM2.5趋势特征,可以用于预测未来污染物浓度周期变化。

关键词: 大气污染现象, 烛台图理论, 模式匹配, 卷积神经网络, PM2.5

CLC Number: