《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (3): 775-780.DOI: 10.11772/j.issn.1001-9081.2025030305

• 数据科学与技术 • 上一篇    下一篇

一次性条件下自适应间隙稀有序列模式挖掘方法

李昊1, 王磊2, 孙乐2, 武优西1()   

  1. 1.河北工业大学 人工智能与数据科学学院,天津 300401
    2.天津市人民检察院 检察技术部,天津 300222
  • 收稿日期:2025-03-25 修回日期:2025-06-23 接受日期:2025-06-25 发布日期:2025-07-07 出版日期:2026-03-10
  • 通讯作者: 武优西
  • 作者简介:李昊(2000—),男,河北张家口人,硕士研究生,主要研究方向:数据挖掘
    王磊(1980—),男,天津人,硕士,主要研究方向:数据分析
    孙乐(1986—),男,河北石家庄人,硕士,主要研究方向:数据分析
  • 基金资助:
    国家自然科学基金资助项目(62372154)

Rare sequential pattern mining method with adaptive gap under one-off condition

Hao LI1, Lei WANG2, Le SUN2, Youxi WU1()   

  1. 1.School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China
    2.Procuratorial Technology Department,Tianjin Municipal People’s Procuratorate,Tianjin 300222,China
  • Received:2025-03-25 Revised:2025-06-23 Accepted:2025-06-25 Online:2025-07-07 Published:2026-03-10
  • Contact: Youxi WU
  • About author:LI Hao, born in 2000, M. S. candidate. His research interests include data mining.
    WANG Lei, born in 1980, M. S. His research interests include data analysis.
    SUN Le, born in 1986, M. S. His research interests include data analysis.
  • Supported by:
    National Natural Science Foundation of China(62372154)

摘要:

稀有序列模式挖掘旨在发现序列库中不频繁出现的重要模式。然而,现有序列模式方法多采用0或1的判别方式,即判断模式是否在序列中出现,忽略模式在序列中的重复性,即用户的感兴趣程度,导致挖掘结果的偏差。为了解决这一问题,提出一次性条件下自适应间隙稀有序列模式挖掘方法ORP(One-off Rare sequential Pattern mining)。采用一次性条件计算模式在序列中的重复次数,并采用自适应间隙反映序列特征。为了避免传统算法在支持度计算过程中需要对原始数据库进行低效顺序遍历的问题,建立一个倒排索引结构。该结构存储每个事件及其在原始数据库中出现位置的信息,避免了对原始数据库进行冗余遍历的问题,从而提高支持度计算的效率。此外,在候选模式的生成过程中,使用模式连接策略生成候选模式,并在此基础上提出一种剪枝策略进一步减少候选模式的数量,从而提高挖掘速度。在5个真实数据集上的消融实验结果表明,所提方法的运行时间明显更短,从而验证了该方法的优越性。

关键词: 稀有序列模式挖掘, 自适应间隙, 一次性条件, 支持度计算, 剪枝策略

Abstract:

Rare sequential pattern mining aims to discover infrequent and important patterns in sequence databases. However, current sequential pattern mining methods mostly determine whether a pattern occurs in a sequence or not, ignoring the repetition of the pattern in the sequence, that is, the user’s level of interest, resulting in bias in the mining results. To tackle this issue, a rare sequential pattern mining method with adaptive gap under one-off condition was proposed, namely ORP (One-off Rare sequential Pattern mining). In the method, the number of repetitions of the pattern in the sequence was calculated using one-off condition, and the sequence features were reflected using adaptive gaps. To avoid the inefficient sequential traversal of the original database in support calculation process required by the traditional algorithms, an inverted index structure was established, which stores each transaction and its location information occurred in the original database, thereby eliminating the need for any redundant traversal of the database and improving efficiency of the support calculation. Besides, in the process of candidate pattern generation, a pattern connection strategy was used to generate candidate patterns. To further reduce the number of candidate patterns, a pruning strategy was proposed, thereby improving the mining speed. Ablation experimental results on five real datasets show that the running time of the proposed method is significantly shorter, thus verifying the superiority of the proposed method.

Key words: rare sequential pattern mining, adaptive gap, one-off condition, support calculation, pruning strategy

中图分类号: