计算机应用 ›› 2015, Vol. 35 ›› Issue (3): 775-778.DOI: 10.11772/j.issn.1001-9081.2015.03.775

• 人工智能 • 上一篇    下一篇

改进的基于频繁模式树的最大频繁项集挖掘算法——FP-MFIA

杨鹏坤1,2,3, 彭慧1,2,3, 周晓锋1,2,3, 孙玉庆4   

  1. 1. 中国科学院 物联网研究发展中心, 江苏 无锡 214135;
    2. 江苏物联网研究发展中心, 江苏 无锡 214135;
    3. 无锡中科泛在信息技术研发中心有限公司, 江苏 无锡 214135;
    4. 国网枣庄供电公司, 山东 枣庄 277100
  • 收稿日期:2014-09-05 修回日期:2014-11-25 出版日期:2015-03-10 发布日期:2015-03-13
  • 通讯作者: 杨鹏坤
  • 作者简介:杨鹏坤(1987-),男,山东日照人,硕士研究生,主要研究方向:数据分析与处理;彭慧(1963-),男,辽宁沈阳人,研究员,硕士,主要研究方向:制造执行系统;周晓峰(1978-),女,辽宁本溪人,副研究员,博士,主要研究方向:机器学习、数据挖掘;孙玉庆(1988-),男,山东日照人,助理工程师,主要研究方向:调度自动化
  • 基金资助:

    国家科技支撑计划项目(2012BAF12B08)

FP-MFIA: improved algorithm for mining maximum frequent itemsets based on frequent-pattern tree

YANG Pengkun1,2,3, PENG Hui1,2,3, ZHOU Xiaofeng1,2,3, SUN Yuqing4   

  1. 1. Research and Development Center for Internet of Things, Chinese Academy of Sciences, Wuxi Jiangsu 214135, China;
    2. Jiangsu Research and Development Center for Internet of Things, Wuxi Jiangsu 214135, China;
    3. Chinese Academy of Sciences Ubiquitous Information Technology Research and Development Center Company Limited, Wuxi Jiangsu 214135, China;
    4. State Grid Zaozhuang Power Supply Company, Zaozhuang Shandong 277100, China
  • Received:2014-09-05 Revised:2014-11-25 Online:2015-03-10 Published:2015-03-13

摘要:

针对最大频繁项目集挖掘算法(DMFIA)当候选项目集维数高而最大频繁项目集维数较低的情况下要产生大量的候选项目集的缺点,提出了一种改进的基于频繁模式树(FP-tree)结构的最大频繁项目集挖掘算法——FP-MFIA。该算法根据FP-tree的项目头表,采用自底向上的搜索策略逐层挖掘最大频繁项目集,从而加速每次对候选集计数的操作。在挖掘时根据每层的条件模式基产生维数较低的非频繁项目集,尽早对候选项目集进行剪枝和降维,可大量减少候选项目集的数量。同时在挖掘时充分利用最大频繁项集的性质,减少搜索空间。通过算法在不同支持度下挖掘时间的对比可知,算法FP-MFIA在最小支持度较低的情况下时间效率是DMFIA以及基于降维的最大频繁模式挖掘算法(BDRFI)的2倍以上,说明FP-MFIA在候选集维数较高的时候优势明显。

关键词: 最大频繁项集, 频繁模式树, 数据挖掘, 关联规则, 非频繁项集

Abstract:

Focusing on the drawback that Discovering Maximum Frequent Itemsets Algorithm (DMFIA) has to generate lots of maximum frequent candidate itemsets in each dimension when given datasets with many candidate items and each maximum frequent itemset is not long, an improved Algorithm for mining Maximum Frequent Itemsets based of Frequent-Pattern tree (FP-MFIA) for mining maximum frequent itemsets based on FP-tree was proposed. According to Htable of FP-tree, this algorithm used bottom-up searches to mine maximum frequent itemsets, thus accelerated the count of candidates. Producing infrequent itemsets with lower dimension according to conditional pattern base of every layer when mining, cutting and reducing dimensions of candidate itemsets can largely reduce the amount of candidate itemsets. At the same time taking full advantage of properties of maximum frequent itemsets will reduce the search space. The time efficiency of FP-MFIA is at least two times as much as the algorithm of DMFIA and BDRFI (algorithm for mining frequent itemsets based on dimensionality reduction of frequent itemset) according to computational time contrast based on different supports. It shows that FP-MFIA has a clear advantage when candidate itemsets are with high dimension.

Key words: maximum frequent itemset, Frequent Pattern tree (FP-tree), data mining, association rule, infrequent itemset

中图分类号: