《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (12): 3740-3746.DOI: 10.11772/j.issn.1001-9081.2022121828
所属专题: 数据科学与技术
收稿日期:
2022-12-09
修回日期:
2023-02-24
接受日期:
2023-02-28
发布日期:
2023-03-06
出版日期:
2023-12-10
通讯作者:
武优西
作者简介:
孟玉飞(1995—),女,河北石家庄人,硕士研究生,主要研究方向:数据挖掘基金资助:
Yufei MENG1, Youxi WU1(), Zhen WANG1, Yan LI2
Received:
2022-12-09
Revised:
2023-02-24
Accepted:
2023-02-28
Online:
2023-03-06
Published:
2023-12-10
Contact:
Youxi WU
About author:
MENG Yufei, born in 1995, M. S. candidate. Her research interests include data mining.Supported by:
摘要:
针对现有的对比序列模式挖掘方法主要针对字符序列数据集且难以应用于时间序列数据集的问题,提出一种对比保序模式挖掘(COPM)算法。首先,在候选模式生成阶段,采用模式融合策略减少候选模式数;其次在模式支持度计算阶段,利用子模式的匹配结果计算超模式的支持度;最后,设计了动态最小支持度阈值的剪枝策略,以进一步有效地剪枝候选模式。实验结果表明,在6个真实的时间序列数据集上,在内存消耗方面,COPM算法至少比COPM-o(COPM-original)算法降低52.1%,比COPM-e(COPM-enumeration)算法低36.8%,比COPM-p(COPM-prune)算法降低63.6%;同时在运行时间方面,COPM算法至少比COPM-o算法降低30.3%,比COPM-e算法降低8.8%,比COPM-p算法降低41.2%。因此,在算法性能方面,COPM算法优于COPM-o、COPM-e和COPM-p算法。实验结果验证了COPM算法可以有效挖掘对比保序模式,发现不同类别的时间序列数据集间的差异。
中图分类号:
孟玉飞, 武优西, 王珍, 李艳. 对比保序模式挖掘算法[J]. 计算机应用, 2023, 43(12): 3740-3746.
Yufei MENG, Youxi WU, Zhen WANG, Yan LI. Contrast order-preserving pattern mining algorithm[J]. Journal of Computer Applications, 2023, 43(12): 3740-3746.
序号 | 时间序列 | 类别标签 |
---|---|---|
1 | 15,10,18,12,19,14 | D+ |
2 | 13,19,16,22,18 | |
3 | 15,20,17,22,20 | |
4 | 12,13,11,10 | D- |
5 | 15,16,13,14 | |
6 | 12,15,19,11 |
表1 时间序列数据库D示例
Tab.1 Example of time series database D
序号 | 时间序列 | 类别标签 |
---|---|---|
1 | 15,10,18,12,19,14 | D+ |
2 | 13,19,16,22,18 | |
3 | 15,20,17,22,20 | |
4 | 12,13,11,10 | D- |
5 | 15,16,13,14 | |
6 | 12,15,19,11 |
数据集 | 名称 | 单条时间 序列长度 | 数据库中序列数 | 总长度 |
---|---|---|---|---|
D1 | Chinatown | 24 | 345 | 8 280 |
D2 | BeetleFly | 512 | 20 | 10 240 |
D3 | GunPoint | 150 | 150 | 22 500 |
D4 | ItalyPowerDemand | 24 | 1 029 | 24 696 |
D5 | PowerCons | 144 | 180 | 25 920 |
D6 | ShapeletSim | 500 | 180 | 90 000 |
表2 实验数据集
Tab.2 Experimental datasets
数据集 | 名称 | 单条时间 序列长度 | 数据库中序列数 | 总长度 |
---|---|---|---|---|
D1 | Chinatown | 24 | 345 | 8 280 |
D2 | BeetleFly | 512 | 20 | 10 240 |
D3 | GunPoint | 150 | 150 | 22 500 |
D4 | ItalyPowerDemand | 24 | 1 029 | 24 696 |
D5 | PowerCons | 144 | 180 | 25 920 |
D6 | ShapeletSim | 500 | 180 | 90 000 |
指标 | 算法 | D1 | D2 | D3 | D4 | D5 | D6 |
---|---|---|---|---|---|---|---|
内存/ MB | COPM-o | 67.451 | 52.318 | 63.812 | 72.269 | 55.952 | 52.763 |
COPM-e | 29.905 | 80.707 | 121.784 | 45.925 | 35.103 | 34.782 | |
COPM-p | 189.235 | 66.103 | 236.548 | 151.224 | 76.025 | 60.419 | |
COPM | 16.284 | 22.194 | 30.575 | 10.074 | 20.217 | 21.973 | |
运行 时间/ s | COPM-o | 0.607 | 0.261 | 0.501 | 1.302 | 0.313 | 0.479 |
COPM-e | 0.221 | 0.809 | 0.431 | 0.341 | 0.217 | 0.293 | |
COPM-p | 336.734 | 0.391 | 1.038 | 185.176 | 0.398 | 0.369 | |
COPM | 0.165 | 0.182 | 0.271 | 0.148 | 0.198 | 0.217 | |
候选 模式 数 | COPM-o | 161 | 202 | 275 | 142 | 171 | 190 |
COPM-e | 235 | 2 208 | 1 359 | 333 | 431 | 494 | |
COPM-p | 29 925 | 375 | 2 159 | 25 167 | 978 | 894 | |
COPM | 161 | 202 | 275 | 142 | 171 | 190 |
表3 不同算法在数据集D1~D6上不同指标对比
Tab.3 Comparison of different algorithms with different indicators on dataset D1 to D6
指标 | 算法 | D1 | D2 | D3 | D4 | D5 | D6 |
---|---|---|---|---|---|---|---|
内存/ MB | COPM-o | 67.451 | 52.318 | 63.812 | 72.269 | 55.952 | 52.763 |
COPM-e | 29.905 | 80.707 | 121.784 | 45.925 | 35.103 | 34.782 | |
COPM-p | 189.235 | 66.103 | 236.548 | 151.224 | 76.025 | 60.419 | |
COPM | 16.284 | 22.194 | 30.575 | 10.074 | 20.217 | 21.973 | |
运行 时间/ s | COPM-o | 0.607 | 0.261 | 0.501 | 1.302 | 0.313 | 0.479 |
COPM-e | 0.221 | 0.809 | 0.431 | 0.341 | 0.217 | 0.293 | |
COPM-p | 336.734 | 0.391 | 1.038 | 185.176 | 0.398 | 0.369 | |
COPM | 0.165 | 0.182 | 0.271 | 0.148 | 0.198 | 0.217 | |
候选 模式 数 | COPM-o | 161 | 202 | 275 | 142 | 171 | 190 |
COPM-e | 235 | 2 208 | 1 359 | 333 | 431 | 494 | |
COPM-p | 29 925 | 375 | 2 159 | 25 167 | 978 | 894 | |
COPM | 161 | 202 | 275 | 142 | 171 | 190 |
算法 | 内存消耗/MB | 运行时间/s | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
D6_1 | D6_2 | D6_3 | D6_4 | D6_5 | D6_6 | D6_1 | D6_2 | D6_3 | D6_4 | D6_5 | D6_6 | |
COPM-o | 52.763 | 80.916 | 130.892 | 172.057 | 248.719 | 296.423 | 0.479 | 0.833 | 1.464 | 1.923 | 2.641 | 3.074 |
COPM-e | 34.782 | 54.824 | 105.923 | 143.471 | 223.286 | 280.066 | 0.293 | 0.431 | 0.736 | 0.978 | 1.155 | 1.459 |
COPM-p | 60.419 | 102.129 | 142.494 | 184.845 | 255.101 | 303.965 | 0.369 | 0.455 | 0.788 | 1.139 | 1.364 | 1.646 |
COPM | 21.973 | 32.733 | 48.299 | 66.712 | 83.856 | 94.395 | 0.217 | 0.319 | 0.416 | 0.535 | 0.768 | 0.923 |
表4 不同数据集大小下的内存消耗和运行时间对比
Tab.4 Comparison of memory consumption and running time with different dataset sizes
算法 | 内存消耗/MB | 运行时间/s | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
D6_1 | D6_2 | D6_3 | D6_4 | D6_5 | D6_6 | D6_1 | D6_2 | D6_3 | D6_4 | D6_5 | D6_6 | |
COPM-o | 52.763 | 80.916 | 130.892 | 172.057 | 248.719 | 296.423 | 0.479 | 0.833 | 1.464 | 1.923 | 2.641 | 3.074 |
COPM-e | 34.782 | 54.824 | 105.923 | 143.471 | 223.286 | 280.066 | 0.293 | 0.431 | 0.736 | 0.978 | 1.155 | 1.459 |
COPM-p | 60.419 | 102.129 | 142.494 | 184.845 | 255.101 | 303.965 | 0.369 | 0.455 | 0.788 | 1.139 | 1.364 | 1.646 |
COPM | 21.973 | 32.733 | 48.299 | 66.712 | 83.856 | 94.395 | 0.217 | 0.319 | 0.416 | 0.535 | 0.768 | 0.923 |
1 | LIN Y, LIN Z X, LIAO Y, et al. Forecasting the realized volatility of stock price index: a hybrid model integrating CEEMDAN and LSTM [J]. Expert Systems with Applications, 2022, 206: 117736. 10.1016/j.eswa.2022.117736 |
2 | MAN J, DONG H H, GAO J Y, et al. GA-GRGAT: a novel deep learning model for high-speed train axle temperature long term forecasting [J]. Expert Systems with Applications, 2022, 202: 117033. 10.1016/j.eswa.2022.117033 |
3 | DAI C L, WU J, PI D C, et al. Brain EEG time-series clustering using maximum-weight clique [J]. IEEE Transactions on Cybernetics, 2022, 52(1): 357-371. 10.1109/tcyb.2020.2974776 |
4 | CHAKRABARTI K, KEOGH E, MEHROTRA S, et al. Locally adaptive dimensionality reduction for indexing large time series databases [J]. ACM Transactions on Database Systems, 2002, 27(2): 188-228. 10.1145/568518.568520 |
5 | LIN J, KEOGH E, WEI L, et al. Experiencing SAX: a novel symbolic representation of time series [J]. Data Mining and Knowledge Discovery. 2007, 15(2): 107-144. 10.1007/s10618-007-0064-z |
6 | KIM J, EADES P, FLEISCHER R, et al. Order-preserving matching [J]. Theoretical Computer Science, 2014, 525(13): 68-79. 10.1016/j.tcs.2013.10.006 |
7 | 武优西, 刘茜, 闫文杰, 等. 无重叠条件严格模式匹配的高效求解算法[J]. 软件学报, 2021, 32(11): 3331-3350. 10.13328/j.cnki.jos.006054 |
WU Y X, LIU X, YAN W J, et al. Efficient algorithm for solving strict pattern matching under nonoverlapping condition [J]. Journal of Software, 2021, 32(11): 3331-3350. 10.13328/j.cnki.jos.006054 | |
8 | CHO S, NA J C, PARK K, et al. A fast algorithm for order-preserving pattern matching [J]. Information Processing Letters, 2015, 115(2): 397-402. 10.1016/j.ipl.2014.10.018 |
9 | LI Y, YU L, LIU J, et al. NetDPO: (delta, gamma)-approximate pattern matching with gap constraints under one-off condition [J]. Applied Intelligence, 2022, 52(11): 12155-12174. 10.1007/s10489-021-03000-2 |
10 | CHHABRA T, TARHIO J. A filtration method for order-preserving matching [J]. Information Processing Letters, 2016, 116(2): 71-74. 10.1016/j.ipl.2015.10.005 |
11 | KIM Y, KANG M, NA J C, et al. Order-preserving pattern matching with scaling [J]. Information Processing Letters, 2023, 180: 106333. 10.1016/j.ipl.2022.106333 |
12 | WU Y X, HU Q, LI Y, et al. OPP-Miner: order-preserving sequential pattern mining for time series [EB/OL]. [2022-02-09]. . 10.1109/tcyb.2022.3169327 |
13 | WU Y X, ZHAO X Q, LI Y, et al. OPR-Miner: order-preserving rule mining for time series [EB/OL]. [2022-12-04]. . 10.1109/tkde.2022.3224963 |
14 | WU Y X, WANG Y H, LI Y, et al. Top-k self-adaptive contrast sequential pattern mining [J]. IEEE Transactions on Cybernetics, 2022, 52(11): 11819-11833. 10.1109/tcyb.2021.3082114 |
15 | GAN W S, LIN J C, FOURNIER-VIGER P, et al. A survey of parallel sequential pattern mining [J]. ACM Transactions on Knowledge Discovery from Data, 2019, 13(3): 1-34. 10.1145/3314107 |
16 | OKOLICA J S, PETERSON G L, MILLS R F, et al. Sequence pattern mining with variables [J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(1): 177-187. 10.1109/tkde.2018.2881675 |
17 | SAMARIYA D, MA J. A new dimensionality-unbiased score for efficient and effective outlying aspect mining[J]. Data Science and Engineering, 2022, 7(2): 120-135. 10.1007/s41019-022-00185-5 |
18 | 李艳辉, 刘浩, 袁野, 等. 基于差分隐私的频繁序列模式挖掘算法[J]. 计算机应用, 2017, 37(2): 316-321. 10.11772/j.issn.1001-9081.2017.02.0316 |
LI Y H, LIU H, YUAN Y, et al. Frequent sequence pattern mining with differential privacy [J]. Journal of Computer Applications, 2017, 37(2): 316-321. 10.11772/j.issn.1001-9081.2017.02.0316 | |
19 | WU Y X, TONG Y, ZHU X Q, et al. NOSEP: nonoverlapping sequence pattern mining with gap constraints [J]. IEEE Transactions on Cybernetics, 2018, 48(10): 2809-2822. 10.1109/tcyb.2017.2750691 |
20 | 武优西, 周坤, 刘靖宇, 等. 周期性一般间隙约束的序列模式挖掘[J]. 计算机学报, 2017, 40(6): 1338-1352. 10.11897/SP.J.1016.2017.01338 |
WU Y X, ZHOU K, LIU J Y, et al. Mining sequential patterns with periodic general gap constraints [J]. Chinese Journal of Computers, 2017, 40(6): 1338-1352. 10.11897/SP.J.1016.2017.01338 | |
21 | LI Y, ZHANG S, GUO L, et al. NetNMSP: nonoverlapping maximal sequential pattern mining [J]. Applied Intelligence. 2022, 52(9): 9861-9884. 10.1007/s10489-021-02912-3 |
22 | WU Y X, ZHU C R, LI Y, et al. NetNCSP: nonoverlapping closed sequential pattern mining [J]. Knowledge-Based Systems, 2020, 196: 105812. 10.1016/j.knosys.2020.105812 |
23 | WANG L Z, BAO X G, ZHOU L H. Redundancy reduction for prevalent co-location patterns [J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(1): 142-155. 10.1109/tkde.2017.2759110 |
24 | 王晓璇, 王丽珍, 陈红梅, 等. 基于特征效用参与率的空间高效用co-location模式挖掘方法[J]. 计算机学报, 2019, 42(8): 1721-1738. 10.11897/SP.J.1016.2019.01721 |
WANG X X, WANG L Z, CHEN H M, et al. Mining spatial high utility co-location patterns based on feature utility ratio [J]. Chinese Journal of Computers, 2019, 42(8): 1721-1738. 10.11897/SP.J.1016.2019.01721 | |
25 | 马董, 陈红梅, 王丽珍, 等. 空间亚频繁co-location模式的主导特征挖掘[J]. 计算机应用, 2020, 40(2): 465-472. |
MA D, CHEN H M, WANG L Z, et al. Dominant feature mining of spatial sub-prevalent co-location patterns [J]. Journal of Computer Applications, 2020, 40(2): 465-472. | |
26 | LIU C H, GUO C H. Mining top-N high-utility operation patterns for taxi drivers [J]. Expert Systems with Applications, 2021, 170: 114546. 10.1016/j.eswa.2020.114546 |
27 | WU Y X, GENG M, LI Y, et al. HANP-Miner: high average utility nonoverlapping sequential pattern mining [J]. Knowledge-Based Systems, 2021, 229: 107361. 10.1016/j.knosys.2021.107361 |
28 | GAN W S, LIN J C, FOURNIER-VIGER P, et al. HUOPM: high-utility occupancy pattern mining [J]. IEEE Transactions on Cybernetics, 2020, 50(3): 1195-1208. 10.1109/tcyb.2019.2896267 |
29 | WU Y X, CHEN M J, LI Y, et al. ONP-Miner: one-off negative sequential pattern mining [EB/OL]. [2022-07-25]. . 10.1145/3549940 |
30 | DONG X J, GONG Y S, CAO L B. e-RNSP: an efficient method for mining repetition negative sequential patterns [J]. IEEE Transactions on Cybernetics, 2020, 50(5): 2084-2096. 10.1109/tcyb.2018.2869907 |
31 | WU Y X, LUO L F, LI Y, et al. NTP-Miner: nonoverlapping three-way sequential pattern mining [J]. ACM Transaction on Knowledge Discovery from Data, 2022, 16(3): 51. 10.1145/3480245 |
32 | JI X N, BAILEY J, DONG G Z. Mining minimal distinguishing subsequence patterns with gap constraints [J]. Knowledge and Information Systems, 2007, 11(3): 259-286. 10.1007/s10115-006-0038-2 |
33 | WANG X M, DUAN L, DONG G Z, et al. Efficient mining of density-aware distinguishing sequential patterns with gap constraints [J]. Database Systems for Advanced Applications, 2014, 8421: 372-387. 10.1007/978-3-319-05810-8_25 |
34 | 段磊, 唐常杰, 杨宁, 等. 基于显露模式的对比挖掘研究及应用进展[J]. 计算机应用, 2012, 32(2): 304-308. 10.3724/sp.j.1087.2012.00304 |
DUAN L, TANG C J, YANG N, et al. Survey on emerging pattern based contrast mining and applications [J]. Journal of Computer Applications, 2012, 32(2): 304-308. 10.3724/sp.j.1087.2012.00304 | |
35 | WU Y X, WANG Y H, LIU J Y, et al. Mining distinguishing subsequence patterns with nonoverlapping condition [J]. Cluster Computing, 2019, 22: 5905-5907. 10.1007/s10586-017-1671-0 |
36 | HE Z Y, ZHANG S M, GU F Y, et al. Mining conditional discriminative sequential patterns [J]. Information Sciences, 2019, 478: 524-539. 10.1016/j.ins.2018.11.043 |
37 | 武优西,户倩,郭媛,等.保序序列模式挖掘方法: CN202010544303.5 [P]. 2020-06-15. |
WU Y X, H Q, GUO Y, et al. Order-preserving sequential pattern mining algorithm: CN202010544303.5 [P]. 2020-06-15. | |
38 | 武优西,赵晓倩,李艳,等.保序序列规则挖掘方法: CN202210294476.5 [P]. 2022-03-23. 10.1109/tkde.2022.3224963 |
WU Y X, ZHAO X Q, LI Y, et al. Order-preserving sequential rule mining algorithm: CN202210294476.5 [P]. 2022-03-23. 10.1109/tkde.2022.3224963 | |
39 | 武优西,刘锦,耿萌,等.近似保序序列模式挖掘方法: CN202210295950.6 [P]. 2022-03-23. 10.1109/tcyb.2022.3169327 |
WU Y X, LIU J, GENG M, et al. Approximate order-preserving sequential pattern mining algorithm: CN202210295950.6 [P]. 2022-03-23. 10.1109/tcyb.2022.3169327 |
[1] | 任烈弘, 黄铝文, 田旭, 段飞. 基于DFT的频率敏感双分支Transformer多变量长时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2739-2746. |
[2] | 赵秦壮, 谭红叶. 基于自适应阈值学习的时序因果推断方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2660-2666. |
[3] | 范黎林, 曹富康, 王琬婷, 杨凯, 宋钊瑜. 基于需求模式自适应匹配的间歇性需求预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2747-2755. |
[4] | 王美, 苏雪松, 刘佳, 殷若南, 黄珊. 时频域多尺度交叉注意力融合的时间序列分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1842-1847. |
[5] | 袁子璇, 翁小清, 戈宁振. 基于正交局部保持映射和成本优化的多变量时间序列早期分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1832-1841. |
[6] | 徐泽鑫, 杨磊, 李康顺. 较短的长序列时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1824-1831. |
[7] | 孟凡, 杨群力, 霍静, 王新宽. 基于边缘异常候选集的迭代式主动多元时序异常检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1458-1463. |
[8] | 王华华, 张旭, 李峰. 面向高速移动环境的二级信号检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1236-1241. |
[9] | 任帅, 纪元法, 孙希延, 韦照川, 林子安. 基于改进灰狼优化与支持向量回归的滑坡位移预测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 972-982. |
[10] | 杨克帅, 武优西, 耿萌, 刘靖宇, 李艳. 一次性条件下top-k高平均效用序列模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 477-484. |
[11] | 曾渝, 张洋, 曾尚, 付茂栗, 何启学, 曾林隆. 基于多尺度门控膨胀卷积网络的时间序列预测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3427-3434. |
[12] | 范艺扬, 张洋, 曾尚, 曾渝, 付茂栗. 基于分解和频域特征提取的多变量长时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3442-3448. |
[13] | 赵培, 乔焰, 胡荣耀, 袁新宇, 李敏悦, 张本初. 基于多域特征提取的多变量时间序列异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3419-3426. |
[14] | 蒋辉, 闫秋艳, 姜竹郡. 面向多元时间序列异常检测的对称正定自编码器方法[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3294-3299. |
[15] | 叶力硕, 何志学. 融合小波分解的多尺度时间序列异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3300-3306. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||