Statistically significant sequential patterns mining algorithm under influence degree

Jun WU(), Aijia OUYANG, Lin ZHANG   

  1. School of Information Engineering,Zunyi Normal University,Zunyi Guizhou 563006,China
  • Received:2021-07-19 Revised:2021-10-22 Accepted:2021-10-25 Online:2021-11-10 Published:2022-09-10
  • Contact: Jun WU
  • About author:OUYANG Aijia, born in 1975, Ph. D., professor. His research interests include intelligent computing, parallel computing.
    ZHANG Lin, born in 1984, M. S., associate professor. Her research interests include data mining.
  • Supported by:
    National Natural Science Foundation of China(62066049);Joint Fund Program of Zunyi Science and Technology Bureau(ZSKHHZ(2022)123)


吴军(), 欧阳艾嘉, 张琳   

  1. 遵义师范学院 信息工程学院,贵州 遵义 563006
  • 通讯作者: 吴军
  • 作者简介:欧阳艾嘉(1975—),男,湖南娄底人,教授,博士,CCF会员,主要研究方向:智能计算、并行计算;
  • 基金资助:


Aiming at the problems that the degree of support is not a good indicator for the interestingness of sequential patterns and the quality of reported sequential patterns is not evaluated in traditional sequential patterns mining algorithms, a statistically significant sequential patterns mining algorithm under influence degree, calling ISSPM (Influence-based Significant Sequential Patterns Mining), was proposed. Firstly, all sequential patterns meeting the interestingness constraint were mined recursively. Then, the itemset permuting method was introduced to construct permutation test null distribution for these sequential patterns. Finally, the statistical measures of the evaluated sequential patterns were calculated from this distribution, and all statistically significant sequential patterns were found from the above sequential patterns. In the experiments with the PSPM (Prefix-projected Sequential Patterns Mining), SPDL (Sequential Patterns Discovering under Leverage) and PSDSP (Permutation Strategies for Discovering Sequential Patterns) algorithms on the real-world sequential record datasets, ISSPM algorithm reports fewer but more interesting sequential patterns. Experimental results on the synthetic sequential record datasets show that the average proportion of the false positive sequential patterns reported by the ISSPM algorithm is 3.39%, and the discovery rate of embedded patterns of this algorithm is not less than 66.7%, which are significantly better than those of the above three algorithms to compare. It can be seen that the statistically significant sequential patterns reported by ISSPM algorithm can reflect more valuable information in sequential record datasets, and the decisions made based on the information are more reliable.

Key words: data mining, sequential pattern mining, interestingness measure, statistically significant pattern, permutation test



关键词: 数据挖掘, 序列模式挖掘, 兴趣度度量, 统计显著模式, 置换检验

