改进的基于频繁模式树的最大频繁项集挖掘算法——FP-MFIA

doi:10.11772/j.issn.1001-9081.2015.03.775

计算机应用 ›› 2015, Vol. 35 ›› Issue (3): 775-778.DOI: 10.11772/j.issn.1001-9081.2015.03.775

改进的基于频繁模式树的最大频繁项集挖掘算法——FP-MFIA

杨鹏坤^1,2,3, 彭慧^1,2,3, 周晓锋^1,2,3, 孙玉庆⁴

1. 中国科学院物联网研究发展中心, 江苏无锡 214135;
2. 江苏物联网研究发展中心, 江苏无锡 214135;
3. 无锡中科泛在信息技术研发中心有限公司, 江苏无锡 214135;
4. 国网枣庄供电公司, 山东枣庄 277100

收稿日期:2014-09-05 修回日期:2014-11-25 发布日期:2015-03-13 出版日期:2015-03-10
通讯作者: 杨鹏坤
作者简介:杨鹏坤(1987-),男,山东日照人,硕士研究生,主要研究方向:数据分析与处理;彭慧(1963-),男,辽宁沈阳人,研究员,硕士,主要研究方向:制造执行系统;周晓峰(1978-),女,辽宁本溪人,副研究员,博士,主要研究方向:机器学习、数据挖掘;孙玉庆(1988-),男,山东日照人,助理工程师,主要研究方向:调度自动化
基金资助:
国家科技支撑计划项目(2012BAF12B08)

FP-MFIA: improved algorithm for mining maximum frequent itemsets based on frequent-pattern tree

YANG Pengkun^1,2,3, PENG Hui^1,2,3, ZHOU Xiaofeng^1,2,3, SUN Yuqing⁴

1. Research and Development Center for Internet of Things, Chinese Academy of Sciences, Wuxi Jiangsu 214135, China;
2. Jiangsu Research and Development Center for Internet of Things, Wuxi Jiangsu 214135, China;
3. Chinese Academy of Sciences Ubiquitous Information Technology Research and Development Center Company Limited, Wuxi Jiangsu 214135, China;
4. State Grid Zaozhuang Power Supply Company, Zaozhuang Shandong 277100, China

Received:2014-09-05 Revised:2014-11-25 Online:2015-03-13 Published:2015-03-10

摘要/Abstract

摘要：

针对最大频繁项目集挖掘算法(DMFIA)当候选项目集维数高而最大频繁项目集维数较低的情况下要产生大量的候选项目集的缺点,提出了一种改进的基于频繁模式树(FP-tree)结构的最大频繁项目集挖掘算法——FP-MFIA。该算法根据FP-tree的项目头表,采用自底向上的搜索策略逐层挖掘最大频繁项目集,从而加速每次对候选集计数的操作。在挖掘时根据每层的条件模式基产生维数较低的非频繁项目集,尽早对候选项目集进行剪枝和降维,可大量减少候选项目集的数量。同时在挖掘时充分利用最大频繁项集的性质,减少搜索空间。通过算法在不同支持度下挖掘时间的对比可知,算法FP-MFIA在最小支持度较低的情况下时间效率是DMFIA以及基于降维的最大频繁模式挖掘算法(BDRFI)的2倍以上,说明FP-MFIA在候选集维数较高的时候优势明显。

关键词: 最大频繁项集, 频繁模式树, 数据挖掘, 关联规则, 非频繁项集

Abstract:

Focusing on the drawback that Discovering Maximum Frequent Itemsets Algorithm (DMFIA) has to generate lots of maximum frequent candidate itemsets in each dimension when given datasets with many candidate items and each maximum frequent itemset is not long, an improved Algorithm for mining Maximum Frequent Itemsets based of Frequent-Pattern tree (FP-MFIA) for mining maximum frequent itemsets based on FP-tree was proposed. According to Htable of FP-tree, this algorithm used bottom-up searches to mine maximum frequent itemsets, thus accelerated the count of candidates. Producing infrequent itemsets with lower dimension according to conditional pattern base of every layer when mining, cutting and reducing dimensions of candidate itemsets can largely reduce the amount of candidate itemsets. At the same time taking full advantage of properties of maximum frequent itemsets will reduce the search space. The time efficiency of FP-MFIA is at least two times as much as the algorithm of DMFIA and BDRFI (algorithm for mining frequent itemsets based on dimensionality reduction of frequent itemset) according to computational time contrast based on different supports. It shows that FP-MFIA has a clear advantage when candidate itemsets are with high dimension.

Key words: maximum frequent itemset, Frequent Pattern tree (FP-tree), data mining, association rule, infrequent itemset

中图分类号:

TP311

杨鹏坤, 彭慧, 周晓锋, 孙玉庆. 改进的基于频繁模式树的最大频繁项集挖掘算法——FP-MFIA[J]. 计算机应用, 2015, 35(3): 775-778.

YANG Pengkun, PENG Hui, ZHOU Xiaofeng, SUN Yuqing. FP-MFIA: improved algorithm for mining maximum frequent itemsets based on frequent-pattern tree[J]. Journal of Computer Applications, 2015, 35(3): 775-778.

参考文献

[1] AGRAWAL R, IMIELINSKI T, SWAMI A. Mining association rules between sets of items in large database[C]//Proceedings of 1993 ACM SIGMOD Conference on Management of Data. New York: ACM, 1993:207-216.
[2] PARK J S, CHEN M-S, YU P S. An effective Hash-based algorithm for mining association rules[C]//Proceedings of 1995 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1995: 175-186.
[3] LI H, WANG Y, ZHANG D, et al. PFP: parallel FP-growth for query recommendation[C]//Proceedings of the 2008 ACM Conference on Recommender Systems. New York: ACM, 2008:125-137.
[4] FU D, WANG Z. Mining algorithm of association rule based on FP-tree and constrained concept lattice and application research[J]. Application Research of Computers, 2012,31(4):1013-1019.(付冬梅,王志强.基于FP-tree和约束概念格的关联规则挖掘算法及应用研究[J].计算机应用研究,2014,31(4):1013-1019.)
[5] GRAHNE G, ZHU J. High performance mining of maximal frequent itemsets[EB/OL].[2014-07-06]. http://www.docin.com/p-773109811.html.
[6] LIN D, KEDEM Z. Pincer-search: a new algorithm for discovering the maximum frequent set[C]//Proceedings of the 6th European Conference on Extending Database Technology. Berlin: Springer-Verlag, 1998:105-119.
[7] SONG Y, ZHU Y, SUN Z, et al. An algorithm and its updating algorithm based on FP-tree for mining maximum frequent itemsets[J]. Journal of Software, 2003,14(9):1586-1592.(宋余庆,朱玉全,孙志挥,等.基于FP-tree的最大频繁项目集挖掘及更新算法[J].软件学报,2003,14(9):1586-1592.)
[8] HAN J, PEI J, YIN Y. Mining frequent patterns without candidate generation[C]//Proceedings of the 2000 ACM-SIGMOD International Conference on Management of Data. New York: ACM, 2000:1-12.
[9] QIAN X, HUI L. Algorithm for mining maximum frequent itemsets based on decreasing dimension of frequent itemset in association rules[J]. Journal of Computer Applications, 2011,31(5):1339-1344.(钱雪忠,惠亮.关联规则中基于降维的最大频繁模式挖掘算法[J].计算机应用,2011,31(5):1339-1344.)
[10] ZHAO Z, WANG F, WAN J. Maximal frequent itemsets mining algorithm based on OWSFP-tree[J]. Computer Engineering and Design, 2013,34(5):1687-1690.(赵志刚,王芳,万军.基于OWSFP-Tree的最大频繁项目集挖掘算法[J].计算机工程与设计,2013,34(5):1687-1690.)
[11] HE B. Fast algorithm for mining global maximum frequent itemsets based on FP-tree[J]. Computer Integrated Manufacturing Systems, 2011,17(7):1547-1553.(何波.基于FP-Tree的快速挖掘全局最大频繁项算法[J].计算机集成制造系统,2011,17(7):1547-1553.)
[12] JI G, YANG M, SONG Y, et al. Fast updating maximum frequent itemsets[J]. Chinese Journal of Computers, 2005,28(1):128-135.(吉根林,杨明,宋余庆,等.最大频繁项目集的快速更新[J].计算机学报,2005,28(1):128-135.)
[13] QIN L, SHI Z. SFP-Max-a sorted FP-tree based algorithm for maximal frequent patterns mining[J]. Journal of Computer Research and Development, 2005,42(2):217-223.(秦亮曦,史忠植.SFP-Max-基于排序FP-树的最大频繁模式挖掘算法[J].计算机研究与发展,2005,42(2):217-223.)

[1]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[2]	董瑶, 付怡雪, 董永峰, 史进, 陈晨. 不完整多视图聚类综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1673-1682.
[3]	杨克帅, 武优西, 耿萌, 刘靖宇, 李艳. 一次性条件下top-k高平均效用序列模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 477-484.
[4]	郑浩东, 马华, 谢颖超, 唐文胜. 融合遗忘因素与记忆门的图神经网络知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2747-2752.
[5]	黄硕, 李艳辉, 曹建秋. 本地化差分隐私下的频繁序列模式挖掘算法PrivSPM[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2057-2064.
[6]	蒋华, 李星, 王慧娇, 韦静海. 基于数据索引结构的跨级高效用项集挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2200-2208.
[7]	祁超帅, 何文思, 焦毅, 马英红, 蔡伟, 任素萍. 无人机飞行数据异常检测算法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1833-1841.
[8]	李元江, 权金升, 谭阳奕, 杨田. 基于相似和差异双视角的高维数据属性约简[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1467-1472.
[9]	邵小萌, 张猛. 融合注意力机制的时间卷积知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 343-348.
[10]	李文全, 毛伊敏, 彭新东. 基于犹豫模糊集的凝聚式层次聚类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3755-3763.
[11]	李兴佳, 杨秋辉, 洪玫, 潘春霞, 刘瑞航. 基于历史数据和多目标优化的测试用例排序方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 221-226.
[12]	吴军, 欧阳艾嘉, 张琳. 基于影响度的统计显著序列模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2713-2721.
[13]	余顺坤, 闫泓序. 基于确定性因子的启发式属性值约简模型[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 469-474.
[14]	刘世泽, 秦艳君, 王晨星, 苏琳, 柯其学, 罗海勇, 孙艺, 王宝会. 基于深度残差长短记忆网络交通流量预测算法[J]. 计算机应用, 2021, 41(6): 1566-1572.
[15]	李旭娟, 皮建勇, 黄飞翔, 贾海朋. 基于自生成深度神经网络的4D航迹预测[J]. 计算机应用, 2021, 41(5): 1492-1499.

改进的基于频繁模式树的最大频繁项集挖掘算法——FP-MFIA

FP-MFIA: improved algorithm for mining maximum frequent itemsets based on frequent-pattern tree

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics