Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (9): 2713-2721.DOI: 10.11772/j.issn.1001-9081.2021071311
• Data science and technology • Previous Articles Next Articles
					
						                                                                                                                                                                                                                    Jun WU( ), Aijia OUYANG, Lin ZHANG
), Aijia OUYANG, Lin ZHANG
												  
						
						
						
					
				
Received:2021-07-19
															
							
																	Revised:2021-10-22
															
							
																	Accepted:2021-10-25
															
							
							
																	Online:2021-11-10
															
							
																	Published:2022-09-10
															
							
						Contact:
								Jun WU   
													About author:OUYANG Aijia, born in 1975, Ph. D., professor. His research interests include intelligent computing, parallel computing.Supported by:通讯作者:
					吴军
							作者简介:欧阳艾嘉(1975—),男,湖南娄底人,教授,博士,CCF会员,主要研究方向:智能计算、并行计算;基金资助:CLC Number:
Jun WU, Aijia OUYANG, Lin ZHANG. Statistically significant sequential patterns mining algorithm under influence degree[J]. Journal of Computer Applications, 2022, 42(9): 2713-2721.
吴军, 欧阳艾嘉, 张琳. 基于影响度的统计显著序列模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2713-2721.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071311
| 序列模式 | 支持度/% | 序列模式 | 支持度/% | 
|---|---|---|---|
| 13.0 | 8.6 | ||
| 9.8 | 8.0 | ||
| 9.1 | 
Tab. 1 Top-5 sequential patterns with largest degree of support in ATS dataset
| 序列模式 | 支持度/% | 序列模式 | 支持度/% | 
|---|---|---|---|
| 13.0 | 8.6 | ||
| 9.8 | 8.0 | ||
| 9.1 | 
| 序列记录集合 | 记录数 | 项数 | 平均长度 | 重复项 | 
|---|---|---|---|---|
| Book | 788 | 3 844 | 96.5 | 有 | 
| Unix | 4 015 | 1 103 | 26.4 | 有 | 
| Peptide | 15 784 | 20 | 27.0 | 有 | 
| Bike | 21 078 | 67 | 7.3 | 有 | 
Tab. 2 Information of real-world sequential record datasets
| 序列记录集合 | 记录数 | 项数 | 平均长度 | 重复项 | 
|---|---|---|---|---|
| Book | 788 | 3 844 | 96.5 | 有 | 
| Unix | 4 015 | 1 103 | 26.4 | 有 | 
| Peptide | 15 784 | 20 | 27.0 | 有 | 
| Bike | 21 078 | 67 | 7.3 | 有 | 
| 模式序号 | PSPM | SPDL | PSDSP | ISSPM | 
|---|---|---|---|---|
| 1 | ||||
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 5 | ||||
| 6 | ||||
| 7 | 
Tab. 3 Top 7 2-length sequential patterns with the largest interestingness reported by each algorithm on Book dataset
| 模式序号 | PSPM | SPDL | PSDSP | ISSPM | 
|---|---|---|---|---|
| 1 | ||||
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 5 | ||||
| 6 | ||||
| 7 | 
| 算法 | 序列模式数量 | 非假阳性模式数量 | 假阳性模式数量 | 
|---|---|---|---|
| PSPM | 17 925.6 | 3 176.4 | 14 749.2 | 
| SPDL | 10 864.5 | 2 284.6 | 8 579.9 | 
| PSDSP | 1 986.4 | 1 896.8 | 89.6 | 
| ISSPMFDR | 1 216.2 | 1 174.9 | 41.3 | 
| ISSPMFWER | 868.7 | 860.6 | 8.1 | 
Tab. 4 Number of non-false positive patterns and false positive patterns reported by each algorithm on synthetic sequential pattern datasets
| 算法 | 序列模式数量 | 非假阳性模式数量 | 假阳性模式数量 | 
|---|---|---|---|
| PSPM | 17 925.6 | 3 176.4 | 14 749.2 | 
| SPDL | 10 864.5 | 2 284.6 | 8 579.9 | 
| PSDSP | 1 986.4 | 1 896.8 | 89.6 | 
| ISSPMFDR | 1 216.2 | 1 174.9 | 41.3 | 
| ISSPMFWER | 868.7 | 860.6 | 8.1 | 
| 集合编号 | PSPM | SPDL | PSDSP | ISSPM | 
|---|---|---|---|---|
| 1 | 0.0 | 66.7 | 0.0 | 100.0 | 
| 2 | 11.1 | 44.4 | 11.1 | 88.9 | 
| 3 | 22.2 | 44.4 | 22.2 | 77.8 | 
| 4 | 0.0 | 55.6 | 0.0 | 100.0 | 
| 5 | 11.1 | 44.4 | 11.1 | 77.8 | 
| 6 | 22.2 | 33.3 | 22.2 | 66.7 | 
| 7 | 22.2 | 33.3 | 22.2 | 77.8 | 
| 8 | 0.0 | 66.7 | 0.0 | 100.0 | 
| 9 | 22.2 | 33.3 | 22.2 | 66.7 | 
| 10 | 11.1 | 44.4 | 11.1 | 88.9 | 
Tab. 5 Embedded patterns discovery rate reported by each algorithm on synthetic sequential pattern datasets
| 集合编号 | PSPM | SPDL | PSDSP | ISSPM | 
|---|---|---|---|---|
| 1 | 0.0 | 66.7 | 0.0 | 100.0 | 
| 2 | 11.1 | 44.4 | 11.1 | 88.9 | 
| 3 | 22.2 | 44.4 | 22.2 | 77.8 | 
| 4 | 0.0 | 55.6 | 0.0 | 100.0 | 
| 5 | 11.1 | 44.4 | 11.1 | 77.8 | 
| 6 | 22.2 | 33.3 | 22.2 | 66.7 | 
| 7 | 22.2 | 33.3 | 22.2 | 77.8 | 
| 8 | 0.0 | 66.7 | 0.0 | 100.0 | 
| 9 | 22.2 | 33.3 | 22.2 | 66.7 | 
| 10 | 11.1 | 44.4 | 11.1 | 88.9 | 
| 算法 | 挖掘阶段 | 评估阶段 | 总和 | 
|---|---|---|---|
| PSPM | 236.2 | 236.2 | |
| SPDL | 272.5 | 272.5 | |
| PSDSP | 72.7 | 7 163.2 | 7 235.9 | 
| ISSPM | 215.4 | 581.3 | 796.7 | 
Tab. 6 Average running time of each algorithm on synthetic sequential pattern datasets
| 算法 | 挖掘阶段 | 评估阶段 | 总和 | 
|---|---|---|---|
| PSPM | 236.2 | 236.2 | |
| SPDL | 272.5 | 272.5 | |
| PSDSP | 72.7 | 7 163.2 | 7 235.9 | 
| ISSPM | 215.4 | 581.3 | 796.7 | 
| 1 | HAN J W, CHENG H, XIN D, et al. Frequent pattern mining: current status and future directions[J]. Data Mining and Knowledge Discovery, 2007, 15(1): 55-86. 10.1007/s10618-006-0059-1 | 
| 2 | 谢彬,张琨,蔡颖,等. 移动目标关联共现规则挖掘算法研究[J]. 计算机工程, 2018, 44(8): 61-67, 73. | 
| XIE B, ZHANG K, CAI Y, et al. Research on mining algorithm for association co-occurrence rule of moving targets[J]. Computer Engineering, 2018, 44(8): 61-67, 73. | |
| 3 | 黄亚坤,王杨,王明星. 综合社区与关联序列挖掘的电子政务推荐算法[J]. 计算机应用, 2017, 37(9): 2671-2677. 10.11772/j.issn.1001-9081.2017.09.2671 | 
| HUANG Y K, WANG Y, WANG M X. E-government recommendation algorithm combining community and association sequence mining[J]. Journal of Computer Applications, 2017, 37(9): 2671-2677. 10.11772/j.issn.1001-9081.2017.09.2671 | |
| 4 | FOURNIER-VIGER P, LIN J C W, KIRAN R U, et al. A survey of sequential pattern mining[J]. Data Science and Pattern Recognition, 2017, 1(1): 54-77. | 
| 5 | GAN W S, LIN J C W, FOURNIER-VIGER P, et al. A survey of parallel sequential pattern mining[J]. ACM Transactions on Knowledge Discovery from Data, 2019, 13(3): No.25. 10.1145/3314107 | 
| 6 | SHAIKH M R, McNICHOLAS P D, ANTONIE M L, et al. Standardizing interestingness measures for association rules[J]. Statistical Analysis and Data Mining, 2018, 11(6): 282-295. 10.1002/sam.11394 | 
| 7 | HÄMÄLÄINEN W, WEBB G I. A tutorial on statistically sound pattern discovery[J]. Data Mining and Knowledge Discovery, 2019, 33(2): 325-377. 10.1007/s10618-018-0590-x | 
| 8 | 潘舒,祁云嵩. 多重假设检验及其在大数据特征降维中的应用[J]. 计算机科学, 2015, 42(6A): 89-93. | 
| PAN S, QI Y S. Multiple hypothesis testing and its application in feature dimension reduction[J]. Computer Science, 2015, 42(6A): 89-93. | |
| 9 | HAN J W, PEI J, YIN Y W. Mining frequent patterns without candidate generation[J]. ACM SIGMOD Record, 2000, 29(2): 1-12. 10.1145/335191.335372 | 
| 10 | YAN D, QU W W, GUO G M, et al. PrefixFPM: a parallel framework for general-purpose frequent pattern mining[C]// Proceedings of the IEEE 36th International Conference on Data Engineering. Piscataway: IEEE, 2020: 1938-1941. 10.1109/icde48307.2020.00208 | 
| 11 | CHEE C H, JAAFAR J, AZIZ I A, et al. Algorithms for frequent itemset mining: a literature review[J]. Artificial Intelligence Review, 2019, 52(4): 2603-2621. 10.1007/s10462-018-9629-z | 
| 12 | FOURNIER-VIGER P, LIN J C W, VO B, et al. A survey of itemset mining[J]. WIREs Data Mining and Knowledge Discovery, 2017, 7(4): No.e1207. 10.1002/widm.1207 | 
| 13 | PEI J, HAN J W, MORTAZAVI-ASL B, et al. Mining sequential patterns by pattern-growth: the PrefixSpan approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1424-1440. 10.1109/tkde.2004.77 | 
| 14 | WU Y X, ZHU C R, LI Y, et al. NetNCSP: nonoverlapping closed sequential pattern mining[J]. Knowledge-Based Systems, 2020, 196: No.105812. 10.1016/j.knosys.2020.105812 | 
| 15 | SON L H, CHICLANA F, KUMAR R, et al. ARM-AMO: an efficient association rule mining algorithm based on animal migration optimization[J]. Knowledge-Based Systems, 2018, 154: 68-80. 10.1016/j.knosys.2018.04.038 | 
| 16 | WANG C S, CHANG J Y. MISFP-growth: Hadoop-based frequent pattern mining with multiple item support[J]. Applied Sciences, 2019, 9(10): No.2075. 10.3390/app9102075 | 
| 17 | KOH Y S, RAVANA S D. Unsupervised rare pattern mining: a survey[J]. ACM Transactions on Knowledge Discovery from Data, 2016, 10(4): No.45. 10.1145/2898359 | 
| 18 | LIU X Q, WU J, GU F Y, et al. Discriminative pattern mining and its applications in bioinformatics[J]. Briefings in Bioinformatics, 2015, 16(5): 884-900. 10.1093/bib/bbu042 | 
| 19 | YU H H, CHEN C H, TSENG V S. Mining emerging patterns from time series data with time gap constraint[J]. International Journal of Innovative Computing, Information and Control, 2011, 7(9): 5515-5528. | 
| 20 | GUNS T, NIJSSEN S, DE RAEDT L. K-pattern set mining under constraints[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(2): 402-418. 10.1109/tkde.2011.204 | 
| 21 | PETITJEAN F, LI T, TATTI N, et al. Skopus: mining top-k sequential patterns under leverage[J]. Data Mining and Knowledge Discovery, 2016, 30(5): 1086-1111. 10.1007/s10618-016-0467-9 | 
| 22 | TEW C, GIRAUD-CARRIER C, TANNER K, et al. Behavior-based clustering and analysis of interestingness measures for association rule mining[J]. Data Mining and Knowledge Discovery, 2014, 28(4): 1004-1045. 10.1007/s10618-013-0326-x | 
| 23 | TONON A, VANDIN F. Permutation strategies for mining significant sequential patterns[C]// Proceedings of the 2019 IEEE International Conference on Data Mining. Piscataway: IEEE, 2019: 1330-1335. 10.1109/icdm.2019.00169 | 
| 24 | PELLEGRINA L, RIONDATO M, VANDIN F. SPuManTE: significant pattern mining with unconditional testing[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2019: 1528-1538. 10.1145/3292500.3330978 | 
| 25 | 吴军,段琼,张琳,等. 磷酸化基序精确置换检验p-value的计算方法[J]. 中国科学:信息科学, 2017, 47(10): 1334-1348. | 
| WU J, DUAN Q, ZHANG L, et al. Computing exact permutation p-values for phosphorylation motifs[J]. SCIENTIA SINICA Informationis, 2017, 47(10): 1334-1348. | |
| 26 | DUA D, GRAFF C. UCI machine learning repository[DB/OL]. [2021-04-15].. | 
| 27 | DIELLA F, CAMERON S, GEMÜND C, et al. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins[J]. BMC Bioinformatics, 2004, 5: No.79. 10.1186/1471-2105-5-79 | 
| [1] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. | 
| [2] | Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN. Survey of incomplete multi-view clustering [J]. Journal of Computer Applications, 2024, 44(6): 1673-1682. | 
| [3] | Keshuai YANG, Youxi WU, Meng GENG, Jingyu LIU, Yan LI. Top-k high average utility sequential pattern mining algorithm under one-off condition [J]. Journal of Computer Applications, 2024, 44(2): 477-484. | 
| [4] | Haodong ZHENG, Hua MA, Yingchao XIE, Wensheng TANG. Knowledge tracing model based on graph neural network blending with forgetting factors and memory gate [J]. Journal of Computer Applications, 2023, 43(9): 2747-2752. | 
| [5] | Shuo HUANG, Yanhui LI, Jianqiu CAO. PrivSPM: frequent sequential pattern mining algorithm under local differential privacy [J]. Journal of Computer Applications, 2023, 43(7): 2057-2064. | 
| [6] | Hua JIANG, Xing LI, Huijiao WANG, Jinghai WEI. Cross-level high utility itemsets mining algorithm based on data index structure [J]. Journal of Computer Applications, 2023, 43(7): 2200-2208. | 
| [7] | Chaoshuai QI, Wensi HE, Yi JIAO, Yinghong MA, Wei CAI, Suping REN. Survey on anomaly detection algorithms for unmanned aerial vehicle flight data [J]. Journal of Computer Applications, 2023, 43(6): 1833-1841. | 
| [8] | Yuanjiang LI, Jinsheng QUAN, Yangyi TAN, Tian YANG. Attribute reduction for high-dimensional data based on bi-view of similarity and difference [J]. Journal of Computer Applications, 2023, 43(5): 1467-1472. | 
| [9] | Xiaomeng SHAO, Meng ZHANG. Temporal convolutional knowledge tracing model with attention mechanism [J]. Journal of Computer Applications, 2023, 43(2): 343-348. | 
| [10] | Yufei MENG, Youxi WU, Zhen WANG, Yan LI. Contrast order-preserving pattern mining algorithm [J]. Journal of Computer Applications, 2023, 43(12): 3740-3746. | 
| [11] | Wenquan LI, Yimin MAO, Xindong PENG. Agglomerative hierarchical clustering algorithm based on hesitant fuzzy set [J]. Journal of Computer Applications, 2023, 43(12): 3755-3763. | 
| [12] | Shunkun YU, Hongxu YAN. Heuristic attribute value reduction model based on certainty factor [J]. Journal of Computer Applications, 2022, 42(2): 469-474. | 
| [13] | LIU Shize, QIN Yanjun, WANG Chenxing, SU Lin, KE Qixue, LUO Haiyong, SUN Yi, WANG Baohui. Traffic flow prediction algorithm based on deep residual long short-term memory network [J]. Journal of Computer Applications, 2021, 41(6): 1566-1572. | 
| [14] | LI Xujuan, PI Jianyong, HUANG Feixiang, JIA Haipeng. Self-generated deep neural network based 4D trajectory prediction [J]. Journal of Computer Applications, 2021, 41(5): 1492-1499. | 
| [15] | CHEN Kai, YU Yanwei, ZHAO Jindong, SONG Peng. Work location inference method with big data of urban traffic surveillance [J]. Journal of Computer Applications, 2021, 41(1): 177-184. | 
| Viewed | ||||||
| Full text |  | |||||
| Abstract |  | |||||