Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (9): 2713-2721.DOI: 10.11772/j.issn.1001-9081.2021071311
• Data science and technology • Previous Articles
Jun WU(), Aijia OUYANG, Lin ZHANG
Received:
2021-07-19
Revised:
2021-10-22
Accepted:
2021-10-25
Online:
2021-11-10
Published:
2022-09-10
Contact:
Jun WU
About author:
OUYANG Aijia, born in 1975, Ph. D., professor. His research interests include intelligent computing, parallel computing.Supported by:
通讯作者:
吴军
作者简介:
欧阳艾嘉(1975—),男,湖南娄底人,教授,博士,CCF会员,主要研究方向:智能计算、并行计算;基金资助:
CLC Number:
Jun WU, Aijia OUYANG, Lin ZHANG. Statistically significant sequential patterns mining algorithm under influence degree[J]. Journal of Computer Applications, 2022, 42(9): 2713-2721.
吴军, 欧阳艾嘉, 张琳. 基于影响度的统计显著序列模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2713-2721.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071311
序列模式 | 支持度/% | 序列模式 | 支持度/% |
---|---|---|---|
13.0 | 8.6 | ||
9.8 | 8.0 | ||
9.1 |
Tab. 1 Top-5 sequential patterns with largest degree of support in ATS dataset
序列模式 | 支持度/% | 序列模式 | 支持度/% |
---|---|---|---|
13.0 | 8.6 | ||
9.8 | 8.0 | ||
9.1 |
序列记录集合 | 记录数 | 项数 | 平均长度 | 重复项 |
---|---|---|---|---|
Book | 788 | 3 844 | 96.5 | 有 |
Unix | 4 015 | 1 103 | 26.4 | 有 |
Peptide | 15 784 | 20 | 27.0 | 有 |
Bike | 21 078 | 67 | 7.3 | 有 |
Tab. 2 Information of real-world sequential record datasets
序列记录集合 | 记录数 | 项数 | 平均长度 | 重复项 |
---|---|---|---|---|
Book | 788 | 3 844 | 96.5 | 有 |
Unix | 4 015 | 1 103 | 26.4 | 有 |
Peptide | 15 784 | 20 | 27.0 | 有 |
Bike | 21 078 | 67 | 7.3 | 有 |
模式序号 | PSPM | SPDL | PSDSP | ISSPM |
---|---|---|---|---|
1 | ||||
2 | ||||
3 | ||||
4 | ||||
5 | ||||
6 | ||||
7 |
Tab. 3 Top 7 2-length sequential patterns with the largest interestingness reported by each algorithm on Book dataset
模式序号 | PSPM | SPDL | PSDSP | ISSPM |
---|---|---|---|---|
1 | ||||
2 | ||||
3 | ||||
4 | ||||
5 | ||||
6 | ||||
7 |
算法 | 序列模式数量 | 非假阳性模式数量 | 假阳性模式数量 |
---|---|---|---|
PSPM | 17 925.6 | 3 176.4 | 14 749.2 |
SPDL | 10 864.5 | 2 284.6 | 8 579.9 |
PSDSP | 1 986.4 | 1 896.8 | 89.6 |
ISSPMFDR | 1 216.2 | 1 174.9 | 41.3 |
ISSPMFWER | 868.7 | 860.6 | 8.1 |
Tab. 4 Number of non-false positive patterns and false positive patterns reported by each algorithm on synthetic sequential pattern datasets
算法 | 序列模式数量 | 非假阳性模式数量 | 假阳性模式数量 |
---|---|---|---|
PSPM | 17 925.6 | 3 176.4 | 14 749.2 |
SPDL | 10 864.5 | 2 284.6 | 8 579.9 |
PSDSP | 1 986.4 | 1 896.8 | 89.6 |
ISSPMFDR | 1 216.2 | 1 174.9 | 41.3 |
ISSPMFWER | 868.7 | 860.6 | 8.1 |
集合编号 | PSPM | SPDL | PSDSP | ISSPM |
---|---|---|---|---|
1 | 0.0 | 66.7 | 0.0 | 100.0 |
2 | 11.1 | 44.4 | 11.1 | 88.9 |
3 | 22.2 | 44.4 | 22.2 | 77.8 |
4 | 0.0 | 55.6 | 0.0 | 100.0 |
5 | 11.1 | 44.4 | 11.1 | 77.8 |
6 | 22.2 | 33.3 | 22.2 | 66.7 |
7 | 22.2 | 33.3 | 22.2 | 77.8 |
8 | 0.0 | 66.7 | 0.0 | 100.0 |
9 | 22.2 | 33.3 | 22.2 | 66.7 |
10 | 11.1 | 44.4 | 11.1 | 88.9 |
Tab. 5 Embedded patterns discovery rate reported by each algorithm on synthetic sequential pattern datasets
集合编号 | PSPM | SPDL | PSDSP | ISSPM |
---|---|---|---|---|
1 | 0.0 | 66.7 | 0.0 | 100.0 |
2 | 11.1 | 44.4 | 11.1 | 88.9 |
3 | 22.2 | 44.4 | 22.2 | 77.8 |
4 | 0.0 | 55.6 | 0.0 | 100.0 |
5 | 11.1 | 44.4 | 11.1 | 77.8 |
6 | 22.2 | 33.3 | 22.2 | 66.7 |
7 | 22.2 | 33.3 | 22.2 | 77.8 |
8 | 0.0 | 66.7 | 0.0 | 100.0 |
9 | 22.2 | 33.3 | 22.2 | 66.7 |
10 | 11.1 | 44.4 | 11.1 | 88.9 |
算法 | 挖掘阶段 | 评估阶段 | 总和 |
---|---|---|---|
PSPM | 236.2 | 236.2 | |
SPDL | 272.5 | 272.5 | |
PSDSP | 72.7 | 7 163.2 | 7 235.9 |
ISSPM | 215.4 | 581.3 | 796.7 |
Tab. 6 Average running time of each algorithm on synthetic sequential pattern datasets
算法 | 挖掘阶段 | 评估阶段 | 总和 |
---|---|---|---|
PSPM | 236.2 | 236.2 | |
SPDL | 272.5 | 272.5 | |
PSDSP | 72.7 | 7 163.2 | 7 235.9 |
ISSPM | 215.4 | 581.3 | 796.7 |
1 | HAN J W, CHENG H, XIN D, et al. Frequent pattern mining: current status and future directions[J]. Data Mining and Knowledge Discovery, 2007, 15(1): 55-86. 10.1007/s10618-006-0059-1 |
2 | 谢彬,张琨,蔡颖,等. 移动目标关联共现规则挖掘算法研究[J]. 计算机工程, 2018, 44(8): 61-67, 73. |
XIE B, ZHANG K, CAI Y, et al. Research on mining algorithm for association co-occurrence rule of moving targets[J]. Computer Engineering, 2018, 44(8): 61-67, 73. | |
3 | 黄亚坤,王杨,王明星. 综合社区与关联序列挖掘的电子政务推荐算法[J]. 计算机应用, 2017, 37(9): 2671-2677. 10.11772/j.issn.1001-9081.2017.09.2671 |
HUANG Y K, WANG Y, WANG M X. E-government recommendation algorithm combining community and association sequence mining[J]. Journal of Computer Applications, 2017, 37(9): 2671-2677. 10.11772/j.issn.1001-9081.2017.09.2671 | |
4 | FOURNIER-VIGER P, LIN J C W, KIRAN R U, et al. A survey of sequential pattern mining[J]. Data Science and Pattern Recognition, 2017, 1(1): 54-77. |
5 | GAN W S, LIN J C W, FOURNIER-VIGER P, et al. A survey of parallel sequential pattern mining[J]. ACM Transactions on Knowledge Discovery from Data, 2019, 13(3): No.25. 10.1145/3314107 |
6 | SHAIKH M R, McNICHOLAS P D, ANTONIE M L, et al. Standardizing interestingness measures for association rules[J]. Statistical Analysis and Data Mining, 2018, 11(6): 282-295. 10.1002/sam.11394 |
7 | HÄMÄLÄINEN W, WEBB G I. A tutorial on statistically sound pattern discovery[J]. Data Mining and Knowledge Discovery, 2019, 33(2): 325-377. 10.1007/s10618-018-0590-x |
8 | 潘舒,祁云嵩. 多重假设检验及其在大数据特征降维中的应用[J]. 计算机科学, 2015, 42(6A): 89-93. |
PAN S, QI Y S. Multiple hypothesis testing and its application in feature dimension reduction[J]. Computer Science, 2015, 42(6A): 89-93. | |
9 | HAN J W, PEI J, YIN Y W. Mining frequent patterns without candidate generation[J]. ACM SIGMOD Record, 2000, 29(2): 1-12. 10.1145/335191.335372 |
10 | YAN D, QU W W, GUO G M, et al. PrefixFPM: a parallel framework for general-purpose frequent pattern mining[C]// Proceedings of the IEEE 36th International Conference on Data Engineering. Piscataway: IEEE, 2020: 1938-1941. 10.1109/icde48307.2020.00208 |
11 | CHEE C H, JAAFAR J, AZIZ I A, et al. Algorithms for frequent itemset mining: a literature review[J]. Artificial Intelligence Review, 2019, 52(4): 2603-2621. 10.1007/s10462-018-9629-z |
12 | FOURNIER-VIGER P, LIN J C W, VO B, et al. A survey of itemset mining[J]. WIREs Data Mining and Knowledge Discovery, 2017, 7(4): No.e1207. 10.1002/widm.1207 |
13 | PEI J, HAN J W, MORTAZAVI-ASL B, et al. Mining sequential patterns by pattern-growth: the PrefixSpan approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1424-1440. 10.1109/tkde.2004.77 |
14 | WU Y X, ZHU C R, LI Y, et al. NetNCSP: nonoverlapping closed sequential pattern mining[J]. Knowledge-Based Systems, 2020, 196: No.105812. 10.1016/j.knosys.2020.105812 |
15 | SON L H, CHICLANA F, KUMAR R, et al. ARM-AMO: an efficient association rule mining algorithm based on animal migration optimization[J]. Knowledge-Based Systems, 2018, 154: 68-80. 10.1016/j.knosys.2018.04.038 |
16 | WANG C S, CHANG J Y. MISFP-growth: Hadoop-based frequent pattern mining with multiple item support[J]. Applied Sciences, 2019, 9(10): No.2075. 10.3390/app9102075 |
17 | KOH Y S, RAVANA S D. Unsupervised rare pattern mining: a survey[J]. ACM Transactions on Knowledge Discovery from Data, 2016, 10(4): No.45. 10.1145/2898359 |
18 | LIU X Q, WU J, GU F Y, et al. Discriminative pattern mining and its applications in bioinformatics[J]. Briefings in Bioinformatics, 2015, 16(5): 884-900. 10.1093/bib/bbu042 |
19 | YU H H, CHEN C H, TSENG V S. Mining emerging patterns from time series data with time gap constraint[J]. International Journal of Innovative Computing, Information and Control, 2011, 7(9): 5515-5528. |
20 | GUNS T, NIJSSEN S, DE RAEDT L. K-pattern set mining under constraints[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(2): 402-418. 10.1109/tkde.2011.204 |
21 | PETITJEAN F, LI T, TATTI N, et al. Skopus: mining top-k sequential patterns under leverage[J]. Data Mining and Knowledge Discovery, 2016, 30(5): 1086-1111. 10.1007/s10618-016-0467-9 |
22 | TEW C, GIRAUD-CARRIER C, TANNER K, et al. Behavior-based clustering and analysis of interestingness measures for association rule mining[J]. Data Mining and Knowledge Discovery, 2014, 28(4): 1004-1045. 10.1007/s10618-013-0326-x |
23 | TONON A, VANDIN F. Permutation strategies for mining significant sequential patterns[C]// Proceedings of the 2019 IEEE International Conference on Data Mining. Piscataway: IEEE, 2019: 1330-1335. 10.1109/icdm.2019.00169 |
24 | PELLEGRINA L, RIONDATO M, VANDIN F. SPuManTE: significant pattern mining with unconditional testing[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2019: 1528-1538. 10.1145/3292500.3330978 |
25 | 吴军,段琼,张琳,等. 磷酸化基序精确置换检验p-value的计算方法[J]. 中国科学:信息科学, 2017, 47(10): 1334-1348. |
WU J, DUAN Q, ZHANG L, et al. Computing exact permutation p-values for phosphorylation motifs[J]. SCIENTIA SINICA Informationis, 2017, 47(10): 1334-1348. | |
26 | DUA D, GRAFF C. UCI machine learning repository[DB/OL]. [2021-04-15].. |
27 | DIELLA F, CAMERON S, GEMÜND C, et al. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins[J]. BMC Bioinformatics, 2004, 5: No.79. 10.1186/1471-2105-5-79 |
[1] | Shunkun YU, Hongxu YAN. Heuristic attribute value reduction model based on certainty factor [J]. Journal of Computer Applications, 2022, 42(2): 469-474. |
[2] | LIU Shize, QIN Yanjun, WANG Chenxing, SU Lin, KE Qixue, LUO Haiyong, SUN Yi, WANG Baohui. Traffic flow prediction algorithm based on deep residual long short-term memory network [J]. Journal of Computer Applications, 2021, 41(6): 1566-1572. |
[3] | LI Xujuan, PI Jianyong, HUANG Feixiang, JIA Haipeng. Self-generated deep neural network based 4D trajectory prediction [J]. Journal of Computer Applications, 2021, 41(5): 1492-1499. |
[4] | CHEN Kai, YU Yanwei, ZHAO Jindong, SONG Peng. Work location inference method with big data of urban traffic surveillance [J]. Journal of Computer Applications, 2021, 41(1): 177-184. |
[5] | LONG Yangyang, CHEN Yuling, XIN Yang, DOU Hui. Secure energy transaction scheme based on alliance blockchain [J]. Journal of Computer Applications, 2020, 40(6): 1668-1673. |
[6] | XU Zhoubo, YANG Jian, LIU Huadong, HUANG Wenwen. Protein complex identification algorithm based on XGboost and topological structural information [J]. Journal of Computer Applications, 2020, 40(5): 1510-1514. |
[7] | DU Xusheng, YU Jiong, YE Lele, CHEN Jiaying. Outlier detection algorithm based on graph random walk [J]. Journal of Computer Applications, 2020, 40(5): 1322-1328. |
[8] | Dong MA, Hongmei CHEN, Lizhen WANG, Qing XIAO. Dominant feature mining of spatial sub-prevalent co-location patterns [J]. Journal of Computer Applications, 2020, 40(2): 465-472. |
[9] | Xi CHEN, Guang MEI, Jinjin ZHANG, Weisheng XU. Student grade prediction method based on knowledge graph and collaborative filtering [J]. Journal of Computer Applications, 2020, 40(2): 595-601. |
[10] | LI Shasha, LIANG Dongyang, YU Jie, JI Bin, MA Jun, TAN Yusong, WU Qingbo. Research team mining algorithm based on teacher-student relationship [J]. Journal of Computer Applications, 2020, 40(11): 3198-3202. |
[11] | SUN Heli, ZHANG Youyou, YANG Zhou, HE Liang, JIA Xiaolin. Urban reachable region search based on time segment tree [J]. Journal of Computer Applications, 2020, 40(10): 2936-2941. |
[12] | LI Bo, ZHANG Xiao, YAN Jingyi, LI Kewei, LI Heng, LING Yulong, ZHANG Yong. Application of KNN algorithm based on value difference metric and clustering optimization in bank customer behavior prediction [J]. Journal of Computer Applications, 2019, 39(9): 2784-2788. |
[13] | JI Lina, CHEN Kai, YU Yanwei, SONG Peng, WANG Shuying, WANG Chenrui. Vehicle type mining and application analysis based on urban traffic big data [J]. Journal of Computer Applications, 2019, 39(5): 1343-1350. |
[14] | HAN Meng, DING Jian. Survey of frequent pattern mining over data streams [J]. Journal of Computer Applications, 2019, 39(3): 719-727. |
[15] | YE Zhiyu, FENG Aimin, GAO Hang. Customer purchasing power prediction of Google store based on deep LightGBM ensemble learning model [J]. Journal of Computer Applications, 2019, 39(12): 3434-3439. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||