Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (9): 2713-2721.DOI: 10.11772/j.issn.1001-9081.2021071311
• Data science and technology • Previous Articles Next Articles
Jun WU(), Aijia OUYANG, Lin ZHANG
Received:
2021-07-19
Revised:
2021-10-22
Accepted:
2021-10-25
Online:
2021-11-10
Published:
2022-09-10
Contact:
Jun WU
About author:
OUYANG Aijia, born in 1975, Ph. D., professor. His research interests include intelligent computing, parallel computing.Supported by:
通讯作者:
吴军
作者简介:
欧阳艾嘉(1975—),男,湖南娄底人,教授,博士,CCF会员,主要研究方向:智能计算、并行计算;基金资助:
CLC Number:
Jun WU, Aijia OUYANG, Lin ZHANG. Statistically significant sequential patterns mining algorithm under influence degree[J]. Journal of Computer Applications, 2022, 42(9): 2713-2721.
吴军, 欧阳艾嘉, 张琳. 基于影响度的统计显著序列模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2713-2721.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071311
序列模式 | 支持度/% | 序列模式 | 支持度/% |
---|---|---|---|
13.0 | 8.6 | ||
9.8 | 8.0 | ||
9.1 |
Tab. 1 Top-5 sequential patterns with largest degree of support in ATS dataset
序列模式 | 支持度/% | 序列模式 | 支持度/% |
---|---|---|---|
13.0 | 8.6 | ||
9.8 | 8.0 | ||
9.1 |
序列记录集合 | 记录数 | 项数 | 平均长度 | 重复项 |
---|---|---|---|---|
Book | 788 | 3 844 | 96.5 | 有 |
Unix | 4 015 | 1 103 | 26.4 | 有 |
Peptide | 15 784 | 20 | 27.0 | 有 |
Bike | 21 078 | 67 | 7.3 | 有 |
Tab. 2 Information of real-world sequential record datasets
序列记录集合 | 记录数 | 项数 | 平均长度 | 重复项 |
---|---|---|---|---|
Book | 788 | 3 844 | 96.5 | 有 |
Unix | 4 015 | 1 103 | 26.4 | 有 |
Peptide | 15 784 | 20 | 27.0 | 有 |
Bike | 21 078 | 67 | 7.3 | 有 |
模式序号 | PSPM | SPDL | PSDSP | ISSPM |
---|---|---|---|---|
1 | ||||
2 | ||||
3 | ||||
4 | ||||
5 | ||||
6 | ||||
7 |
Tab. 3 Top 7 2-length sequential patterns with the largest interestingness reported by each algorithm on Book dataset
模式序号 | PSPM | SPDL | PSDSP | ISSPM |
---|---|---|---|---|
1 | ||||
2 | ||||
3 | ||||
4 | ||||
5 | ||||
6 | ||||
7 |
算法 | 序列模式数量 | 非假阳性模式数量 | 假阳性模式数量 |
---|---|---|---|
PSPM | 17 925.6 | 3 176.4 | 14 749.2 |
SPDL | 10 864.5 | 2 284.6 | 8 579.9 |
PSDSP | 1 986.4 | 1 896.8 | 89.6 |
ISSPMFDR | 1 216.2 | 1 174.9 | 41.3 |
ISSPMFWER | 868.7 | 860.6 | 8.1 |
Tab. 4 Number of non-false positive patterns and false positive patterns reported by each algorithm on synthetic sequential pattern datasets
算法 | 序列模式数量 | 非假阳性模式数量 | 假阳性模式数量 |
---|---|---|---|
PSPM | 17 925.6 | 3 176.4 | 14 749.2 |
SPDL | 10 864.5 | 2 284.6 | 8 579.9 |
PSDSP | 1 986.4 | 1 896.8 | 89.6 |
ISSPMFDR | 1 216.2 | 1 174.9 | 41.3 |
ISSPMFWER | 868.7 | 860.6 | 8.1 |
集合编号 | PSPM | SPDL | PSDSP | ISSPM |
---|---|---|---|---|
1 | 0.0 | 66.7 | 0.0 | 100.0 |
2 | 11.1 | 44.4 | 11.1 | 88.9 |
3 | 22.2 | 44.4 | 22.2 | 77.8 |
4 | 0.0 | 55.6 | 0.0 | 100.0 |
5 | 11.1 | 44.4 | 11.1 | 77.8 |
6 | 22.2 | 33.3 | 22.2 | 66.7 |
7 | 22.2 | 33.3 | 22.2 | 77.8 |
8 | 0.0 | 66.7 | 0.0 | 100.0 |
9 | 22.2 | 33.3 | 22.2 | 66.7 |
10 | 11.1 | 44.4 | 11.1 | 88.9 |
Tab. 5 Embedded patterns discovery rate reported by each algorithm on synthetic sequential pattern datasets
集合编号 | PSPM | SPDL | PSDSP | ISSPM |
---|---|---|---|---|
1 | 0.0 | 66.7 | 0.0 | 100.0 |
2 | 11.1 | 44.4 | 11.1 | 88.9 |
3 | 22.2 | 44.4 | 22.2 | 77.8 |
4 | 0.0 | 55.6 | 0.0 | 100.0 |
5 | 11.1 | 44.4 | 11.1 | 77.8 |
6 | 22.2 | 33.3 | 22.2 | 66.7 |
7 | 22.2 | 33.3 | 22.2 | 77.8 |
8 | 0.0 | 66.7 | 0.0 | 100.0 |
9 | 22.2 | 33.3 | 22.2 | 66.7 |
10 | 11.1 | 44.4 | 11.1 | 88.9 |
算法 | 挖掘阶段 | 评估阶段 | 总和 |
---|---|---|---|
PSPM | 236.2 | 236.2 | |
SPDL | 272.5 | 272.5 | |
PSDSP | 72.7 | 7 163.2 | 7 235.9 |
ISSPM | 215.4 | 581.3 | 796.7 |
Tab. 6 Average running time of each algorithm on synthetic sequential pattern datasets
算法 | 挖掘阶段 | 评估阶段 | 总和 |
---|---|---|---|
PSPM | 236.2 | 236.2 | |
SPDL | 272.5 | 272.5 | |
PSDSP | 72.7 | 7 163.2 | 7 235.9 |
ISSPM | 215.4 | 581.3 | 796.7 |
1 | HAN J W, CHENG H, XIN D, et al. Frequent pattern mining: current status and future directions[J]. Data Mining and Knowledge Discovery, 2007, 15(1): 55-86. 10.1007/s10618-006-0059-1 |
2 | 谢彬,张琨,蔡颖,等. 移动目标关联共现规则挖掘算法研究[J]. 计算机工程, 2018, 44(8): 61-67, 73. |
XIE B, ZHANG K, CAI Y, et al. Research on mining algorithm for association co-occurrence rule of moving targets[J]. Computer Engineering, 2018, 44(8): 61-67, 73. | |
3 | 黄亚坤,王杨,王明星. 综合社区与关联序列挖掘的电子政务推荐算法[J]. 计算机应用, 2017, 37(9): 2671-2677. 10.11772/j.issn.1001-9081.2017.09.2671 |
HUANG Y K, WANG Y, WANG M X. E-government recommendation algorithm combining community and association sequence mining[J]. Journal of Computer Applications, 2017, 37(9): 2671-2677. 10.11772/j.issn.1001-9081.2017.09.2671 | |
4 | FOURNIER-VIGER P, LIN J C W, KIRAN R U, et al. A survey of sequential pattern mining[J]. Data Science and Pattern Recognition, 2017, 1(1): 54-77. |
5 | GAN W S, LIN J C W, FOURNIER-VIGER P, et al. A survey of parallel sequential pattern mining[J]. ACM Transactions on Knowledge Discovery from Data, 2019, 13(3): No.25. 10.1145/3314107 |
6 | SHAIKH M R, McNICHOLAS P D, ANTONIE M L, et al. Standardizing interestingness measures for association rules[J]. Statistical Analysis and Data Mining, 2018, 11(6): 282-295. 10.1002/sam.11394 |
7 | HÄMÄLÄINEN W, WEBB G I. A tutorial on statistically sound pattern discovery[J]. Data Mining and Knowledge Discovery, 2019, 33(2): 325-377. 10.1007/s10618-018-0590-x |
8 | 潘舒,祁云嵩. 多重假设检验及其在大数据特征降维中的应用[J]. 计算机科学, 2015, 42(6A): 89-93. |
PAN S, QI Y S. Multiple hypothesis testing and its application in feature dimension reduction[J]. Computer Science, 2015, 42(6A): 89-93. | |
9 | HAN J W, PEI J, YIN Y W. Mining frequent patterns without candidate generation[J]. ACM SIGMOD Record, 2000, 29(2): 1-12. 10.1145/335191.335372 |
10 | YAN D, QU W W, GUO G M, et al. PrefixFPM: a parallel framework for general-purpose frequent pattern mining[C]// Proceedings of the IEEE 36th International Conference on Data Engineering. Piscataway: IEEE, 2020: 1938-1941. 10.1109/icde48307.2020.00208 |
11 | CHEE C H, JAAFAR J, AZIZ I A, et al. Algorithms for frequent itemset mining: a literature review[J]. Artificial Intelligence Review, 2019, 52(4): 2603-2621. 10.1007/s10462-018-9629-z |
12 | FOURNIER-VIGER P, LIN J C W, VO B, et al. A survey of itemset mining[J]. WIREs Data Mining and Knowledge Discovery, 2017, 7(4): No.e1207. 10.1002/widm.1207 |
13 | PEI J, HAN J W, MORTAZAVI-ASL B, et al. Mining sequential patterns by pattern-growth: the PrefixSpan approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1424-1440. 10.1109/tkde.2004.77 |
14 | WU Y X, ZHU C R, LI Y, et al. NetNCSP: nonoverlapping closed sequential pattern mining[J]. Knowledge-Based Systems, 2020, 196: No.105812. 10.1016/j.knosys.2020.105812 |
15 | SON L H, CHICLANA F, KUMAR R, et al. ARM-AMO: an efficient association rule mining algorithm based on animal migration optimization[J]. Knowledge-Based Systems, 2018, 154: 68-80. 10.1016/j.knosys.2018.04.038 |
16 | WANG C S, CHANG J Y. MISFP-growth: Hadoop-based frequent pattern mining with multiple item support[J]. Applied Sciences, 2019, 9(10): No.2075. 10.3390/app9102075 |
17 | KOH Y S, RAVANA S D. Unsupervised rare pattern mining: a survey[J]. ACM Transactions on Knowledge Discovery from Data, 2016, 10(4): No.45. 10.1145/2898359 |
18 | LIU X Q, WU J, GU F Y, et al. Discriminative pattern mining and its applications in bioinformatics[J]. Briefings in Bioinformatics, 2015, 16(5): 884-900. 10.1093/bib/bbu042 |
19 | YU H H, CHEN C H, TSENG V S. Mining emerging patterns from time series data with time gap constraint[J]. International Journal of Innovative Computing, Information and Control, 2011, 7(9): 5515-5528. |
20 | GUNS T, NIJSSEN S, DE RAEDT L. K-pattern set mining under constraints[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(2): 402-418. 10.1109/tkde.2011.204 |
21 | PETITJEAN F, LI T, TATTI N, et al. Skopus: mining top-k sequential patterns under leverage[J]. Data Mining and Knowledge Discovery, 2016, 30(5): 1086-1111. 10.1007/s10618-016-0467-9 |
22 | TEW C, GIRAUD-CARRIER C, TANNER K, et al. Behavior-based clustering and analysis of interestingness measures for association rule mining[J]. Data Mining and Knowledge Discovery, 2014, 28(4): 1004-1045. 10.1007/s10618-013-0326-x |
23 | TONON A, VANDIN F. Permutation strategies for mining significant sequential patterns[C]// Proceedings of the 2019 IEEE International Conference on Data Mining. Piscataway: IEEE, 2019: 1330-1335. 10.1109/icdm.2019.00169 |
24 | PELLEGRINA L, RIONDATO M, VANDIN F. SPuManTE: significant pattern mining with unconditional testing[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2019: 1528-1538. 10.1145/3292500.3330978 |
25 | 吴军,段琼,张琳,等. 磷酸化基序精确置换检验p-value的计算方法[J]. 中国科学:信息科学, 2017, 47(10): 1334-1348. |
WU J, DUAN Q, ZHANG L, et al. Computing exact permutation p-values for phosphorylation motifs[J]. SCIENTIA SINICA Informationis, 2017, 47(10): 1334-1348. | |
26 | DUA D, GRAFF C. UCI machine learning repository[DB/OL]. [2021-04-15].. |
27 | DIELLA F, CAMERON S, GEMÜND C, et al. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins[J]. BMC Bioinformatics, 2004, 5: No.79. 10.1186/1471-2105-5-79 |
[1] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[2] | Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN. Survey of incomplete multi-view clustering [J]. Journal of Computer Applications, 2024, 44(6): 1673-1682. |
[3] | Keshuai YANG, Youxi WU, Meng GENG, Jingyu LIU, Yan LI. Top-k high average utility sequential pattern mining algorithm under one-off condition [J]. Journal of Computer Applications, 2024, 44(2): 477-484. |
[4] | Haodong ZHENG, Hua MA, Yingchao XIE, Wensheng TANG. Knowledge tracing model based on graph neural network blending with forgetting factors and memory gate [J]. Journal of Computer Applications, 2023, 43(9): 2747-2752. |
[5] | Shuo HUANG, Yanhui LI, Jianqiu CAO. PrivSPM: frequent sequential pattern mining algorithm under local differential privacy [J]. Journal of Computer Applications, 2023, 43(7): 2057-2064. |
[6] | Hua JIANG, Xing LI, Huijiao WANG, Jinghai WEI. Cross-level high utility itemsets mining algorithm based on data index structure [J]. Journal of Computer Applications, 2023, 43(7): 2200-2208. |
[7] | Chaoshuai QI, Wensi HE, Yi JIAO, Yinghong MA, Wei CAI, Suping REN. Survey on anomaly detection algorithms for unmanned aerial vehicle flight data [J]. Journal of Computer Applications, 2023, 43(6): 1833-1841. |
[8] | Yuanjiang LI, Jinsheng QUAN, Yangyi TAN, Tian YANG. Attribute reduction for high-dimensional data based on bi-view of similarity and difference [J]. Journal of Computer Applications, 2023, 43(5): 1467-1472. |
[9] | Xiaomeng SHAO, Meng ZHANG. Temporal convolutional knowledge tracing model with attention mechanism [J]. Journal of Computer Applications, 2023, 43(2): 343-348. |
[10] | Yufei MENG, Youxi WU, Zhen WANG, Yan LI. Contrast order-preserving pattern mining algorithm [J]. Journal of Computer Applications, 2023, 43(12): 3740-3746. |
[11] | Wenquan LI, Yimin MAO, Xindong PENG. Agglomerative hierarchical clustering algorithm based on hesitant fuzzy set [J]. Journal of Computer Applications, 2023, 43(12): 3755-3763. |
[12] | Shunkun YU, Hongxu YAN. Heuristic attribute value reduction model based on certainty factor [J]. Journal of Computer Applications, 2022, 42(2): 469-474. |
[13] | LIU Shize, QIN Yanjun, WANG Chenxing, SU Lin, KE Qixue, LUO Haiyong, SUN Yi, WANG Baohui. Traffic flow prediction algorithm based on deep residual long short-term memory network [J]. Journal of Computer Applications, 2021, 41(6): 1566-1572. |
[14] | LI Xujuan, PI Jianyong, HUANG Feixiang, JIA Haipeng. Self-generated deep neural network based 4D trajectory prediction [J]. Journal of Computer Applications, 2021, 41(5): 1492-1499. |
[15] | CHEN Kai, YU Yanwei, ZHAO Jindong, SONG Peng. Work location inference method with big data of urban traffic surveillance [J]. Journal of Computer Applications, 2021, 41(1): 177-184. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||