计算机应用

• 人工智能与仿真 •    下一篇

基于影响度的统计显著序列模式挖掘算法

吴军,欧阳艾嘉,张琳   

  1. 遵义师范学院
  • 收稿日期:2021-07-20 修回日期:2021-10-22 发布日期:2021-11-10 出版日期:2021-11-10
  • 通讯作者: 吴军

Statistically significant sequential patterns mining algorithm under influence

  • Received:2021-07-20 Revised:2021-10-22 Online:2021-11-10 Published:2021-11-10

摘要: 针对传统序列模式挖掘算法中支持度不能如实体现序列模式兴趣度以及未对序列模式进行质量评估的问题,提出一个基于影响度的统计显著序列模式挖掘算法,即ISSPM算法。首先,递归地挖掘出所有满足兴趣度约束的序列模式;然后,使用项集置换方法构建这些序列模式的置换检验零分布;最后,通过该零分布计算出被评估的序列模式的统计度量值,并从中找到所有统计显著序列模式。真实序列记录集合实验结果表明,ISSPM算法相较于PSPM、SPDL和PSDSP算法挖掘到的序列模式数量更少且兴趣度更强;仿真序列记录集合实验结果表明,ISSPM算法报告的结果中假阳性序列模式数量平均占比为3.39%,且嵌入模式的发现率均不低于66.7%,明显优于上述3个对比算法。因此,ISSPM算法报告的统计显著序列模式能够体现序列记录集合中更有价值的信息,同时根据这些信息做出的进一步分析和决策也更加可靠。

Abstract: Abstract: Aiming at the problems that the support measure is not a good indicator of the interestingness of patterns and the quality of reported sequential patterns is not evaluated, a statistically significant sequential patterns mining algorithm called ISSPM (Influence-based Significant Sequential Patterns Mining) was proposed. First, all sequential patterns that met the minimum interestingness constraint were mined recursively. Then, the itemsets permuting method was introduced to construct permutation null distributions for all the sequential patterns. Finally, the statistic values of the evaluated sequential patterns were calculated from the permutation null distributions, and all statistically significant sequential patterns were reported. In the experiments with the PSPM (Prefix-projected Sequential Patterns Mining), SPDL (Sequential Patterns Discovering under Leverage) and PSDSP (Permutation Strategies for Discovering Sequential Patterns) algorithms on the real-world data sets, the ISSPM algorithm reports fewer but more interesting sequential patterns. In addition, the experimental results on the synthetic sequential data sets show that the average proportion of the false positive sequential patterns reported by the ISSPM algorithm is 3.39%, and its discovery rate of embedded patterns is not less than 66.7%, which are significantly better than those of the above algorithms. Therefore, the statistically significant sequential patterns reported by the ISSPM algorithm can reflect more valuable information in data sets, and the decisions made based on the information are more reliable.

中图分类号: