计算机应用 ›› 2010, Vol. 30 ›› Issue (07): 1922-1925.

• 数据库技术 • 上一篇    下一篇

关联规则中FP-tree的最大频繁模式非检验挖掘算法

惠亮1,钱雪忠2   

  1. 1. 江南大学信息工程学院
    2. 江南大学
  • 收稿日期:2010-01-04 修回日期:2010-03-08 发布日期:2010-07-01 出版日期:2010-07-01
  • 通讯作者: 惠亮
  • 基金资助:
    江苏省自然科学基金

Non-check mining algorithm of maximum frequent patterns in association rules based on FP-tree

  • Received:2010-01-04 Revised:2010-03-08 Online:2010-07-01 Published:2010-07-01

摘要: 基于FP-tree的最大频繁模式挖掘算法是目前较为高效的频繁模式挖掘算法,针对这些算法需要递归生成条件FP-tree、做超集检验等问题,在分析DMFIA-1算法的基础上,提出了最大频繁模式的非检验挖掘算法NCMFP。该算法改进了FP-tree的结构,使挖掘过程中不需要生成条件频繁模式树也不需要超集检验。算法采用的预测剪枝策略减少了挖掘的次数,采用的求取公共交集的方式保证了挖掘结果的完整性。实验结果表明在支持度相对较小情况下,NCMFP的效率是同类算法的2~5倍。

关键词: 关联规则, 数据挖掘, 频繁模式树, 最大频繁项集, 超集检验

Abstract: The algorithms based on FP-tree, for mining maximal frequent patterns, have high performance but with many drawbacks. For example, they must recursively generate conditional FPtrees, have to do the process of superset checking. In order to overcome these drawbacks of the existing algorithms, an algorithm NonCheck Mining algorithm of Maximum Frequent Pattern (NCMFP)for mining maximal frequent patterns was put forward after the analysis of DMFIA-1 algorithm. In the algorithm, neither constructing conditional frequent pattern tree recursively nor superset checking was needed through modifying the structure of FP-tree. This algorithm reduced the number of mining through early prediction before mining. The application of a method to get the public intersection sets could obtain a complete result. The experiment shows that the efficiency of NCMFP is two to five times as much as that of the similar algorithms in the case of a relatively small support.

Key words: association rules, data mining, Frequent Pattern Tree(FP-tree), maximal frequent itemsets, Superset Checking