一次性条件下自适应间隙稀有序列模式挖掘方法

doi:10.11772/j.issn.1001-9081.2025030305

《计算机应用》唯一官方网站

• • 下一篇

一次性条件下自适应间隙稀有序列模式挖掘方法

李昊¹，王磊²，孙乐²，武优西¹

1.河北工业大学人工智能与数据科学学院 2.天津市人民检察院检察技术部

收稿日期:2025-03-24 修回日期:2025-06-16 发布日期:2025-07-07 出版日期:2025-07-07
通讯作者: 武优西
作者简介:李昊(2000—)，男，河北张家口人，硕士研究生，主要研究方向：数据挖掘；王磊(1980—)，男，天津人，硕士，主要研究方向：数据分析；孙乐（1986—），男，河北石家庄人，硕士，主要研究方向：数据分析；武优西（1974—），男，天津人，教授，博士，CCF杰出会员，主要研究方向：数据挖掘、数据挖掘。
基金资助:
国家自然科学基金资助项目(62372154)

Rare sequential pattern mining method with adaptive gap under one-off condition#br#
#br#

LI Hao¹, WANG Lei², SUN Le², WU Youxi¹

1. College of Artificial Intelligence, Hebei University of Technology 2. Procuratorial Technology Department, Tianjin Municipal People's Procuratorate

Received:2025-03-24 Revised:2025-06-16 Online:2025-07-07 Published:2025-07-07
About author:LI Hao, born in 2000, M. S. candidate. His research interest includes data mining. WANG Lei, born in 1980, M. S. His research interest includes data analysis. SUN Le, born in 1986, M. S. His research interest includes data analysis. WU Youxi, born in 1974, Ph. D., professor. His research interests include data mining, machine learning.
Supported by:
National Natural Science Foundation of China (62372154)

摘要/Abstract

摘要： 稀有序列模式挖掘旨在发现序列库中不频繁出现的重要模式。然而，现有序列模式方法多采用0或1判别方式，即判断模式是否在序列中出现，忽略模式在序列中的重复性，即用户的感兴趣程度，导致挖掘结果的偏差。为了解决这一问题，探索了一种一次性条件下自适应间隙稀有序列模式挖掘方法，采用一次性条件计算模式在序列中的重复次数，并采用自适应间隙反映序列特征。为了避免传统算法在支持度计算过程中需要对原始数据库进行低效顺序遍历的问题，建立了一个倒排索引结构，该结构存储了每个项目及其原始数据库中出现位置的信息，从而避免了对原始数据库进行冗余遍历问题，提高了支持度计算的效率。在候选模式生成过程中，使用模式连接策略生成候选模式，在此基础上，提出一种剪枝策略，进一步减少候选模式的数量，从而提高了挖掘速度。在5个真实数据集上进行实验，实验结果表明，相较于对比方法，所提方法的运行时间明显更短，从而验证了本文算法的优越性。

关键词: 稀有序列模式挖掘, 自适应间隙, 一次性条件, 支持度计算, 剪枝策略

Abstract: Rare sequential pattern mining aims to discover infrequent and important patterns in sequence databases. However, current sequential pattern mining methods mostly determines whether a pattern occurs in a sequence or not, ignoring the repetition of the pattern in the sequence, that is, the user's level of interest, resulting in bias in the mining results. To tackle this issue, a rare sequential pattern mining method with adaptive gap under one-off condition was proposed, which calculates the number of repetitions of the pattern in the sequence using one-off condition and reflects the sequence features using adaptive gaps. To avoid the inefficient sequential traversal of the original database in the support calculation process of traditional algorithms, an inverted index structure was established, which stores each item and its location in the original database, thereby avoiding the problem of redundant sequential traversal of the original database and improving the efficiency of support calculation. In the process of candidate pattern generation, the pattern join strategy was used to generate candidate patterns. To further reduce the number of candidate patterns, a pruning strategy was proposed, thereby improving the mining speed. Experiments were conducted on five real datasets. The experimental results showed that the proposed algorithm outperforms other competitive algorithms, thus verify the superiority of the proposed algorithm.

Key words: rare sequential pattern mining, adaptive gap, one-off condition, support calculation, pruning strategy

中图分类号:

TP311.13

李昊王磊孙乐武优西. 一次性条件下自适应间隙稀有序列模式挖掘方法[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2025030305.

LI Hao, WANG Lei, SUN Le, WU Youxi. Rare sequential pattern mining method with adaptive gap under one-off condition#br#

#br#

[J]. Journal of Computer Applications, DOI: 10.11772/j.issn.1001-9081.2025030305.

[1]	杨克帅, 武优西, 耿萌, 刘靖宇, 李艳. 一次性条件下top-k高平均效用序列模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 477-484.
[2]	张潇誉, 于自强, 刘承栋, 李博涵, 靖常峰. 面向视频数据的时空伴随模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2330-2337.
[3]	刘慧婷, 沈盛霞, 赵鹏, 姚晟. 不确定数据频繁闭项集挖掘算法[J]. 计算机应用, 2015, 35(10): 2911-2914.
[4]	马丽生姚光顺杨传健. 基于改进FP-tree的最大频繁项目集挖掘算法[J]. 计算机应用, 2012, 32(02): 326-329.

一次性条件下自适应间隙稀有序列模式挖掘方法

Rare sequential pattern mining method with adaptive gap under one-off condition#br#
#br#

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

编辑推荐

Metrics