计算机应用 ›› 2010, Vol. 30 ›› Issue (3): 806-809.

• 数据库与数据挖掘 • 上一篇    下一篇

一种挖掘频繁闭项集的深度优先算法

张炘1,廖频2,郭波3   

  1. 1. 南昌大学 科学技术学院
    2.
    3. 南昌大学科学技术学院
  • 收稿日期:2009-09-15 修回日期:2009-11-26 发布日期:2010-03-14 出版日期:2010-03-01
  • 通讯作者: 张炘
  • 基金资助:
    江西省自然科学基金项目;江西省自然科学基金项目

Depth-first search algorithm for mining frequent closed itemsets

  • Received:2009-09-15 Revised:2009-11-26 Online:2010-03-14 Published:2010-03-01
  • Supported by:
    the foundation of Jiangxi Natural Science No.2008GZS0033;the foundation of Jiangxi Natural Science No.2008GZS0033

摘要: 频繁闭项集挖掘是许多数据挖掘应用中的重要问题。为减少候选项集数量和降低支持度计算的开销,提出一种新的深度优先搜索频繁闭项集(DFFCI)的算法。将改进的压缩频繁模式树(CFP-Tree)表示的数据集信息投影到划分矩阵,使用二进制向量逻辑运算计算支持度,简化了计算过程,减少了时间开销;采用基于支持度预计算技术的全局2-项剪枝和局部扩展剪枝,有效削减了搜索空间。实验结果表明该算法的性能优于其他主流深度优先算法。

关键词: 数据挖掘, 频繁闭项集, 压缩频繁模式树, 划分矩阵

Abstract: Mining frequent closed itemsets is a fundamental and important issue in many data mining applications. A new depth-first search algorithm for mining frequent closed itemsets called depth-first search for frequent closed itemsets (DFFCI) was proposed, which could reduce the number of candidate itemsets and the cost of support counting. DFFCI projected the dataset information stored by the improved Compressed Frequent Pattern tree (CFP-Tree) into the partition matrix, and improved the efficiency of support counting by using binary vector logic operation. Global 2-itemset pruning based on support pre-counting and local extension pruning were used to prune the search space effectively. The experimental results show that DFFCI outperforms other depth-first search algorithms.

Key words: data mining, frequent closed itemset, Compressed Frequent Pattern Tree (CFP-Tree), partition matrix