Journal of Computer Applications ›› 2011, Vol. 31 ›› Issue (05): 1339-1343.DOI: 10.3724/SP.J.1087.2011.01339

• Database technology • Previous Articles     Next Articles

Algorithm for mining maximum frequent itemsets based on decreasing dimension of frequent itemset in association rules

QIAN Xue-zhong, HUI Liang   

  1. School of Internet of Things Engineering,Jiangnan University, Wuxi Jiangsu 214122, China
  • Received:2010-10-11 Revised:2010-11-14 Online:2011-05-01 Published:2011-05-01

关联规则中基于降维的最大频繁模式挖掘算法

钱雪忠,惠亮   

  1. 江南大学 物联网工程学院,江苏 无锡 214122
  • 通讯作者: 惠亮
  • 作者简介:钱雪忠(1967-),男,江苏无锡人,副教授,硕士,主要研究方向:数据库、数据挖掘、网络安全;惠亮(1983-),男,江苏邳州人,硕士研究生,主要研究方向:数据库、数据挖掘。
  • 基金资助:

    江苏省自然科学基金资助项目(BK20003017)。

Abstract: These algorithms based on FP-tree, for mining maximal frequent pattern, have high performance but with many drawbacks. For example, they must recursively generate conditional FP-trees and many candidate maximum frequent itemsets. In order to overcome these drawbacks of the existing algorithms, an algorithm named Based on Dimensionality Reduction of Frequent Itemset (BDRFI) for mining maximal frequent patterns was put forward after the analysis of FPMax and DMFIA algorithms. The new algorithm was based on decreasing dimension of itemset. In order to enhance efficiency of superset checking, the algorithm used Digital Frequent Pattern Tree (DFP-tree) instead of FP-tree, and reduced the number of mining through prediction and pruning before mining. During the mining process, a strategy of decreasing dimension of frequent itemset was used to generate candidate frequent itemsets. The method not only reduced the number of candidate frequent itemsets but also can avoid creating conditional FP-tree separately and recursively. The experimental results show that the efficiency of BDRFI is 2-8 times as much as that of other similar algorithms.

Key words: association rule, data mining, maximum frequent itemset, Frequent Pattern tree (FP-tree), decreasing dimension

摘要: 基于FP-tree的最大频繁模式挖掘算法是目前较为高效的频繁模式挖掘算法,针对这些算法需要递归生成条件FP-tree、产生大量候选最大频繁项集等问题,在分析FPMax、DMFIA算法的基础上,提出基于降维的最大频繁模式挖掘算法(BDRFI)。该算法改传统的FP-tree为数字频繁模式树DFP-tree,提高了超集检验的效率;采用的预测剪枝策略减少了挖掘的次数;基于降低项集维度的挖掘方式,减少了候选项的数目,避免了递归地产生条件频繁模式树,提高了算法的效率。实验结果表明,BDRFI的效率是同类算法的2~8倍。

关键词: 关联规则, 数据挖掘, 最大频繁项集, 频繁模式树, 降维