《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (2): 391-397.DOI: 10.11772/j.issn.1001-9081.2021122190

• 数据科学与技术 • 上一篇    

基于多尺度的时序数据部分周期模式增量挖掘

荀亚玲1, 王林青1, 蔡江辉1,2(), 杨海峰1   

  1. 1.太原科技大学 计算机科学与技术学院,太原 030024
    2.中北大学 计算机科学与技术学院,太原 030051
  • 收稿日期:2021-12-29 修回日期:2022-05-30 接受日期:2022-06-10 发布日期:2022-06-30 出版日期:2023-02-10
  • 通讯作者: 蔡江辉
  • 作者简介:荀亚玲(1980—),女,山西霍州人,副教授,博士,CCF会员,主要研究方向:数据挖掘、并行计算
    王林青(1997—),男,河北秦皇岛人,硕士研究生,主要研究方向:数据挖掘、并行计算
    杨海峰(1980—),男,山西介休人,教授,博士,CCF高级会员,主要研究方向:大数据挖掘与应用。
  • 基金资助:
    国家自然科学基金资助项目(62272336);山西省研究生教育创新项目(2022Y699)

Partial periodic pattern incremental mining of time series data based on multi-scale

Yaling XUN1, Linqing WANG1, Jianghui CAI1,2(), Haifeng YANG1   

  1. 1.College of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan Shanxi 030024,China
    2.College of Computer Science and Technology,North University of China,Taiyuan Shanxi 030051,China
  • Received:2021-12-29 Revised:2022-05-30 Accepted:2022-06-10 Online:2022-06-30 Published:2023-02-10
  • Contact: Jianghui CAI
  • About author:XUN Yaling, born in 1980, Ph. D., associate professor. Her research interests include data mining, parallel computing.
    WANG Linqing, born in 1997, M. S. candidate. His research interests include data mining, parallel computing.
    YANG Haifeng, born in 1980, Ph. D., professor. His research interests include big data mining and application.
  • Supported by:
    National Natural Science Foundation of China(62272336);Graduate Education Innovation Project of Shanxi Province(2022Y699)

摘要:

针对动态时序数据部分周期模式挖掘过程存在的计算复杂度过高和扩展性差等问题,提出了一种结合多尺度理论的时间序列部分周期模式挖掘算法(MSI-PPPGrowth),所提算法充分利用了时序数据客观存在的时间多尺度特性,将多尺度理论引入时序数据的部分周期模式挖掘过程。首先,将尺度划分后的原始数据以及增量时序数据作为更细粒度的基准尺度数据集进行独立挖掘;然后,利用不同尺度数据间的相关性实现尺度转换,以间接获取动态更新后的数据集对应的全局频繁模式,从而避免了原始数据集的重复扫描和树结构的不断调整。其中,基于克里金法并考虑时序周期性设计了一个新的频繁缺失计数估计模型(PJK-EstimateCount),以有效估计在尺度转换过程中的缺失项支持度计数。实验结果表明,MSI-PPPGrowth具有良好的可扩展性和实时性,尤其是对于稠密数据集,其性能优势更为突出。

关键词: 频繁项集挖掘, 时序数据, 部分周期模式, 多尺度, 增量挖掘

Abstract:

Aiming at the problems of high computational complexity and poor expansibility in the mining process of partial periodic patterns from dynamic time series data, a partial periodic pattern mining algorithm for dynamic time series data combined with multi-scale theory, named MSI-PPPGrowth (Multi-Scale Incremental Partial Periodic Frequent Pattern) was proposed. In MSI-PPPGrowth, the objective multi-scale characteristics of time series data, were made full use, and the multi-scale theory was introduced in the mining process of partial periodic patterns from time series data. Firstly, both the original data after scale division and incremental time series data were used as a finer-grained benchmark scale dataset for independent mining. Then, the correlation between different scales was used to realize scale transformation, so as to indirectly obtain global frequent patterns corresponding to the dynamically updated dataset. Therefore, the repeated scanning of the original dataset and the constant adjustment of the tree structure were avoided. In which, a new frequent missing count estimation model PJK-EstimateCount was designed based on Kriging method considering the periodicity of time series to effectively estimate the frequent missing item support count in scale transformation. Experimental results show that MSI-PPPGrowth has good scalability and real-time performance. Especially for dense datasets, MSI-PPPGrowth has significant performance advantages.

Key words: frequent itemset mining, time series data, partial periodic pattern, multi-scale, incremental mining

中图分类号: