《计算机应用》唯一官方网站

• •    下一篇

一次性条件下自适应间隙稀有序列模式挖掘方法

李昊1,王磊2,孙乐2,武优西1   

  1. 1.河北工业大学 人工智能与数据科学学院 2.天津市人民检察院 检察技术部
  • 收稿日期:2025-03-24 修回日期:2025-06-16 发布日期:2025-07-07 出版日期:2025-07-07
  • 通讯作者: 武优西
  • 作者简介:李昊(2000—),男,河北张家口人,硕士研究生,主要研究方向:数据挖掘;王磊(1980—),男,天津人,硕士,主要研究方向:数据分析;孙乐(1986—),男,河北石家庄人,硕士,主要研究方向:数据分析;武优西(1974—),男,天津人,教授,博士,CCF杰出会员,主要研究方向:数据挖掘、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(62372154)

Rare sequential pattern mining method with adaptive gap under one-off condition#br#
#br#

LI Hao1, WANG Lei2, SUN Le2, WU Youxi1   

  1. 1. College of Artificial Intelligence, Hebei University of Technology 2. Procuratorial Technology Department, Tianjin Municipal People's Procuratorate
  • Received:2025-03-24 Revised:2025-06-16 Online:2025-07-07 Published:2025-07-07
  • About author:LI Hao, born in 2000, M. S. candidate. His research interest includes data mining. WANG Lei, born in 1980, M. S. His research interest includes data analysis. SUN Le, born in 1986, M. S. His research interest includes data analysis. WU Youxi, born in 1974, Ph. D., professor. His research interests include data mining, machine learning.
  • Supported by:
    National Natural Science Foundation of China (62372154)

摘要: 稀有序列模式挖掘旨在发现序列库中不频繁出现的重要模式。然而,现有序列模式方法多采用0或1判别方式,即判断模式是否在序列中出现,忽略模式在序列中的重复性,即用户的感兴趣程度,导致挖掘结果的偏差。为了解决这一问题,探索了一种一次性条件下自适应间隙稀有序列模式挖掘方法,采用一次性条件计算模式在序列中的重复次数,并采用自适应间隙反映序列特征。为了避免传统算法在支持度计算过程中需要对原始数据库进行低效顺序遍历的问题,建立了一个倒排索引结构,该结构存储了每个项目及其原始数据库中出现位置的信息,从而避免了对原始数据库进行冗余遍历问题,提高了支持度计算的效率。在候选模式生成过程中,使用模式连接策略生成候选模式,在此基础上,提出一种剪枝策略,进一步减少候选模式的数量,从而提高了挖掘速度。在5个真实数据集上进行实验,实验结果表明,相较于对比方法,所提方法的运行时间明显更短,从而验证了本文算法的优越性。

关键词: 稀有序列模式挖掘, 自适应间隙, 一次性条件, 支持度计算, 剪枝策略

Abstract: Rare sequential pattern mining aims to discover infrequent and important patterns in sequence databases. However, current sequential pattern mining methods mostly determines whether a pattern occurs in a sequence or not, ignoring the repetition of the pattern in the sequence, that is, the user's level of interest, resulting in bias in the mining results. To tackle this issue, a rare sequential pattern mining method with adaptive gap under one-off condition was proposed, which calculates the number of repetitions of the pattern in the sequence using one-off condition and reflects the sequence features using adaptive gaps. To avoid the inefficient sequential traversal of the original database in the support calculation process of traditional algorithms, an inverted index structure was established, which stores each item and its location in the original database, thereby avoiding the problem of redundant sequential traversal of the original database and improving the efficiency of support calculation. In the process of candidate pattern generation, the pattern join strategy was used to generate candidate patterns. To further reduce the number of candidate patterns, a pruning strategy was proposed, thereby improving the mining speed. Experiments were conducted on five real datasets. The experimental results showed that the proposed algorithm outperforms other competitive algorithms, thus verify the superiority of the proposed algorithm.

Key words: rare sequential pattern mining, adaptive gap, one-off condition, support calculation, pruning strategy

中图分类号: