《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (12): 3740-3746.DOI: 10.11772/j.issn.1001-9081.2022121828

• 数据科学与技术 • 上一篇    下一篇

对比保序模式挖掘算法

孟玉飞1, 武优西1(), 王珍1, 李艳2   

  1. 1.河北工业大学 人工智能与数据科学学院,天津 300401
    2.河北工业大学 经济管理学院,天津 300401
  • 收稿日期:2022-12-09 修回日期:2023-02-24 接受日期:2023-02-28 发布日期:2023-03-06 出版日期:2023-12-10
  • 通讯作者: 武优西
  • 作者简介:孟玉飞(1995—),女,河北石家庄人,硕士研究生,主要研究方向:数据挖掘
    王珍(1997—),女,河南周口人,硕士研究生,主要研究方向:数据挖掘
    李艳(1975—),女,天津人,副教授,博士,主要研究方向:数据挖掘、供应链管理。
  • 基金资助:
    河北省自然科学基金资助项目(F2020202013)

Contrast order-preserving pattern mining algorithm

Yufei MENG1, Youxi WU1(), Zhen WANG1, Yan LI2   

  1. 1.School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China
    2.School of Economics and Management,Hebei University of Technology,Tianjin 300401,China
  • Received:2022-12-09 Revised:2023-02-24 Accepted:2023-02-28 Online:2023-03-06 Published:2023-12-10
  • Contact: Youxi WU
  • About author:MENG Yufei, born in 1995, M. S. candidate. Her research interests include data mining.
    WANG Zhen, born in 1997, M. S. candidate. Her research interests include data mining.
    LI Yan, born in 1975, Ph. D., associate professor. Her research interests include data mining, supply chain management.
  • Supported by:
    Natural Science Foundation of Hebei Province(F20202013)

摘要:

针对现有的对比序列模式挖掘方法主要针对字符序列数据集且难以应用于时间序列数据集的问题,提出一种对比保序模式挖掘(COPM)算法。首先,在候选模式生成阶段,采用模式融合策略减少候选模式数;其次在模式支持度计算阶段,利用子模式的匹配结果计算超模式的支持度;最后,设计了动态最小支持度阈值的剪枝策略,以进一步有效地剪枝候选模式。实验结果表明,在6个真实的时间序列数据集上,在内存消耗方面,COPM算法至少比COPM-o(COPM-original)算法降低52.1%,比COPM-e(COPM-enumeration)算法低36.8%,比COPM-p(COPM-prune)算法降低63.6%;同时在运行时间方面,COPM算法至少比COPM-o算法降低30.3%,比COPM-e算法降低8.8%,比COPM-p算法降低41.2%。因此,在算法性能方面,COPM算法优于COPM-o、COPM-e和COPM-p算法。实验结果验证了COPM算法可以有效挖掘对比保序模式,发现不同类别的时间序列数据集间的差异。

关键词: 模式挖掘, 序列模式挖掘, 时间序列, 对比模式, 保序模式

Abstract:

Aiming at the problem that the existing contrast sequential pattern mining methods mainly focus on character sequence datasets and are difficult to be applied to time series datasets, a new Contrast Order-preserving Pattern Mining (COPM) algorithm was proposed. Firstly, in the candidate pattern generation stage, a pattern fusion strategy was used to reduce the number of candidate patterns. Then, in the pattern support calculation stage, the support of super-pattern was calculated by using the matching results of sub-patterns. Finally, a dynamic pruning strategy of minimum support threshold was designed to further effectively prune the candidate patterns. Experimental results show that on six real time series datasets, the memory consumption of COPM algorithm is at least 52.1% lower than that of COPM-o (COPM-original) algorithm, 36.8% lower than that of COPM-e (COPM-enumeration) algorithm, and 63.6% lower than that of COPM-p (COPM-prune) algorithm. At the same time, the running time of COPM algorithm is at least 30.3% lower than that of COPM-o algorithm, 8.8% lower than that of COPM-e algorithm and 41.2% lower than that of COPM-p algorithm. Therefore, in terms of algorithm performance, COPM algorithm is superior to COPM-o, COPM-e and COPM-p algorithms. The experimental results verify that COPM algorithm can effectively mine the contrast order-preserving patterns to find the differences between different classes of time series datasets.

Key words: pattern mining, sequential pattern mining, time series, contrast pattern, order-preserving pattern

中图分类号: