计算机应用 ›› 2011, Vol. 31 ›› Issue (05): 1391-1394.DOI: 10.3724/SP.J.1087.2011.01391

• 数据库技术 • 上一篇    下一篇

无项头表的FP-Growth算法

凌绪雄1,2,王社国1,李洋2,3, 苗再2   

  1. 1.河北工程大学 信息与电气工程学院,河北 邯郸056000
    2.浪潮集团博士后工作站, 济南250101
    3.山东大学 博士后流动站,济南 250101
  • 收稿日期:2010-12-01 修回日期:2011-01-12 发布日期:2011-05-01 出版日期:2011-05-01
  • 通讯作者: 凌绪雄
  • 作者简介:凌绪雄(1985-),男,四川凉山人,硕士,主要研究方向:数据挖掘、语音识别;王社国(1967-),男,河北永年人,副教授,主要研究方向:语音识别、图像处理、数据库;李 洋(1978-),男,黑龙江哈尔滨人,博士,主要研究方向:神经网络、专家系统、数据挖掘;苗再良(1961-),男,山东潍坊人,高级研究员,主要研究方向:智能化移动互联网、告警监控。

No-header-table FP-Growth algorithm

LING Xu-xiong1,2, WANG She-guo1, LI Yang2,3, MIAO Zai-liang2   

  1. 1. College of Information and Electrical Engineering, Hebei University of Engineering, Handan Hebei 056000, China
    2.INSPUR Post Doctors Work Station, Jinan Shandong 250101, China
    3. Post-doctoral Station, Shandong University, Jinan Shandong 250101, China
  • Received:2010-12-01 Revised:2011-01-12 Online:2011-05-01 Published:2011-05-01
  • Contact: Xu-Xiong LING

摘要: 针对FP-Growth算法中频繁模式树的遍历低效问题,提出了一种无项头表的频繁模式增长算法。该算法利用递归回溯的方式遍历频繁模式树以求取条件模式基,解决了对同一树路径多次重复遍历的问题。从理论分析和实际挖掘能力两方面,将新算法与FP-Growth算法进行了对比。结果表明,新算法有效减少了条件模式基的搜索开销,使频繁模式挖掘的效率提高了2~5倍,在时间和空间性能上均优于FP-Growth算法。将该算法应用于通信告警关联规则挖掘,较快地挖掘出了关联规则结果,且正确规则的覆盖率达到了83.3%。

关键词: 项头表, 频繁模式, 关联规则, 告警关联, 数据挖掘

Abstract: Concerning the problem of low traversal efficiency when searching the FP-tree for conditional pattern bases, a new no-header-table FP-Growth algorithm was proposed. The algorithm employed a recursively backtracking way to search for conditional pattern bases, avoiding traversing the same FP-tree path multiple times. Compared with FP-Growth algorithm in terms of theoretical analysis and actual mining performance, this algorithm greatly reduced the searching cost and improved the mining efficiency of frequent patterns by 2-5 times. Finally, the algorithm was used to mine association rules in telecommunication network alarms. The high mining speed, with the coverage of 83.3% against correct rules, shows that it is superior to FP-Growth both in time and space performance.

Key words: item header table, frequent pattern, association rule, alarm association, data mining

中图分类号: