计算机应用 ›› 2011, Vol. 31 ›› Issue (09): 2405-2407.

• 数据库技术 • 上一篇    下一篇

基于改进PrefixSpan的序列模式挖掘算法

公伟1,2,刘培玉1,2,贾娴1,2   

  1. 1. 山东省分布式计算机软件新技术重点实验室,济南 250014
    2. 山东师范大学 信息科学与工程学院,济南 250014
  • 收稿日期:2011-03-07 修回日期:2011-06-16 发布日期:2011-09-01 出版日期:2011-09-01
  • 通讯作者: 公伟
  • 作者简介:公伟(1987-),男,山东淄博人,硕士研究生,主要研究方向:网络信息安全、网络安全审计;
    刘培玉(1960-),男,山东潍坊人,教授,博士生导师,CCF高级会员,主要研究方向:计算机网络信息安全、网络系统规划、网络信息资源开发、软件开发;
    贾娴(1984-),女,山东菏泽人,硕士研究生,主要研究方向:网络信息安全、网络安全审计。
  • 基金资助:
    国家自然科学基金资助项目(61003131;61003138;61073116);山东省自然科学基金资助项目(ZR2009GM009);山东省自然科学基金资助项目(ZR2009GZ007);山东省教育厅科技计划项目(J09LG52)

Sequential patterns mining algorithm based on improved PrefixSpan

GONG Wei1,2,LIU Pei-yu1,2,JIA Xian1,2   

  1. 1. School of Information Science and Engineering, Shandong Normal University, Jinan Shandong 250014, China
    2. Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan Shandong 250014, China
  • Received:2011-03-07 Revised:2011-06-16 Online:2011-09-01 Published:2011-09-01
  • Contact: GONG Wei

摘要: 针对PrefixSpan算法构造投影数据库开销大的问题,提出一种基于改进PrefixSpan的序列模式挖掘算法SPMIP。该方法通过添加剪枝步和减少某些特定序列模式生成过程的扫描,来减少投影数据库的规模及扫描投影数据库的时间,提高算法效率,并最终得到需要的序列模式。实验结果证明在获得序列模式不受影响情况下,SPMIP算法比PrefixSpan算法效率更高。

关键词: PrefixSpan, 序列模式, 投影数据库, 剪枝, 扫描

Abstract: PrefixSpan, the classic sequential patterns mining algorithm, has the problem of producing huge amount of project databases. To solve this problem, a sequential patterns mining algorithm named SPMIP was proposed based on an improved PrefixSpan. This algorithm reduced the scale of projected databases and the time of scanning projected databases through adding pruning step and reducing scanning of certain specific sequential patterns production. In this way, algorithm efficiency could be raised up, and the needed sequential patterns were obtained. The experimental results show that SPMIP is more efficient than PrefixSpan while obtained sequential patterns have not been affected.

Key words: PrefixSpan, squential pattern, project database, pruning, scanning

中图分类号: