计算机应用 ›› 2011, Vol. 31 ›› Issue (01): 101-103.

• 人工智能 • 上一篇    下一篇

基于改进的FP-tree的频繁模式挖掘算法

李也白1,唐辉2,贺玉明3   

  1. 1. 北方工业大学计算机应用技术研究所
    2. 北京市北方工业大学
    3. 北方工业大学
  • 收稿日期:2010-06-18 修回日期:2010-07-26 发布日期:2011-01-12 出版日期:2011-01-01
  • 通讯作者: 贺玉明

Frequent pattern mining algorithm based on Improved FP-tree

  • Received:2010-06-18 Revised:2010-07-26 Online:2011-01-12 Published:2011-01-01

摘要: FP-growth算法是一种基于FP-tree数据结构的高效的频繁模式挖掘算法,它不产生候选集。构造频繁模式树FP-tree需扫描数据库两次,在第二遍扫描中还扫描了那些仅包含了非频繁项的事务,针对此问题,在深入分析了FP-tree特性的基础上, 改进了FP-tree构造过程,同时用一种基于Hash表的辅助存储结构,节省了项目查找时间,提高了挖掘效率。

关键词: 数据挖掘, 关联规则, 频繁模式, FP-growth, FP-tree

Abstract: FP-growth is an efficient frequent pattern mining algorithm based on data structure of FP-tree, which does not generate candidate sets. Constructing frequent pattern tree TP-tree requires to scan data twice, what’s more, transactions which only contain non-frequent items are also scanned during the second scanning. In order to solve this problem, after analyzing particularity of FP-tree deeply, we improve construction process of FP-tree and employ an auxiliary storage structure that bases on hash table, which saves time of searching items and enhances mining efficiency.

Key words: data mining, association rule, frequent pattern, FP-growth, FP-tree