关联规则中FP-tree的最大频繁模式非检验挖掘算法

计算机应用 ›› 2010, Vol. 30 ›› Issue (07): 1922-1925.

关联规则中FP-tree的最大频繁模式非检验挖掘算法

惠亮¹,钱雪忠²

1. 江南大学信息工程学院
2. 江南大学

收稿日期:2010-01-04 修回日期:2010-03-08 发布日期:2010-07-01 出版日期:2010-07-01
通讯作者: 惠亮
基金资助:
江苏省自然科学基金

Non-check mining algorithm of maximum frequent patterns in association rules based on FP-tree

Received:2010-01-04 Revised:2010-03-08 Online:2010-07-01 Published:2010-07-01

摘要/Abstract

摘要： 基于FP-tree的最大频繁模式挖掘算法是目前较为高效的频繁模式挖掘算法，针对这些算法需要递归生成条件FP-tree、做超集检验等问题，在分析DMFIA-1算法的基础上，提出了最大频繁模式的非检验挖掘算法NCMFP。该算法改进了FP-tree的结构，使挖掘过程中不需要生成条件频繁模式树也不需要超集检验。算法采用的预测剪枝策略减少了挖掘的次数，采用的求取公共交集的方式保证了挖掘结果的完整性。实验结果表明在支持度相对较小情况下，NCMFP的效率是同类算法的2~5倍。

关键词: 关联规则, 数据挖掘, 频繁模式树, 最大频繁项集, 超集检验

Abstract: The algorithms based on FP-tree, for mining maximal frequent patterns, have high performance but with many drawbacks. For example, they must recursively generate conditional FPtrees, have to do the process of superset checking. In order to overcome these drawbacks of the existing algorithms, an algorithm NonCheck Mining algorithm of Maximum Frequent Pattern (NCMFP）for mining maximal frequent patterns was put forward after the analysis of DMFIA-1 algorithm. In the algorithm, neither constructing conditional frequent pattern tree recursively nor superset checking was needed through modifying the structure of FP-tree. This algorithm reduced the number of mining through early prediction before mining. The application of a method to get the public intersection sets could obtain a complete result. The experiment shows that the efficiency of NCMFP is two to five times as much as that of the similar algorithms in the case of a relatively small support.

Key words: association rules, data mining, Frequent Pattern Tree（FP-tree）, maximal frequent itemsets, Superset Checking

惠亮钱雪忠. 关联规则中FP-tree的最大频繁模式非检验挖掘算法[J]. 计算机应用, 2010, 30(07): 1922-1925.

[1]	刘世泽, 秦艳君, 王晨星, 苏琳, 柯其学, 罗海勇, 孙艺, 王宝会. 基于深度残差长短记忆网络交通流量预测算法[J]. 计算机应用, 2021, 41(6): 1566-1572.
[2]	李旭娟, 皮建勇, 黄飞翔, 贾海朋. 基于自生成深度神经网络的4D航迹预测[J]. 计算机应用, 2021, 41(5): 1492-1499.
[3]	陈凯, 于彦伟, 赵金东, 宋鹏. 基于城市交通监控大数据的工作位置推理方法[J]. 计算机应用, 2021, 41(1): 177-184.
[4]	龙洋洋, 陈玉玲, 辛阳, 豆慧. 基于联盟区块链的安全能源交易方案[J]. 计算机应用, 2020, 40(6): 1668-1673.
[5]	杜旭升, 于炯, 叶乐乐, 陈嘉颖. 基于图上随机游走的离群点检测算法[J]. 计算机应用, 2020, 40(5): 1322-1328.
[6]	徐周波, 杨健, 刘华东, 黄文文. 基于XGBoost与拓扑结构信息的蛋白质复合物识别算法[J]. 计算机应用, 2020, 40(5): 1510-1514.
[7]	马董, 陈红梅, 王丽珍, 肖清. 空间亚频繁co-location模式的主导特征挖掘[J]. 计算机应用, 2020, 40(2): 465-472.
[8]	陈曦, 梅广, 张金金, 许维胜. 融合知识图谱和协同过滤的学生成绩预测方法[J]. 计算机应用, 2020, 40(2): 595-601.
[9]	李莎莎, 梁冬阳, 余杰, 纪斌, 马俊, 谭郁松, 吴庆波. 基于师门关系的研究团队挖掘算法[J]. 计算机应用, 2020, 40(11): 3198-3202.
[10]	孙鹤立, 张优优, 杨洲, 何亮, 贾晓琳. 基于时间线段树的城市可达区域搜索[J]. 计算机应用, 2020, 40(10): 2936-2941.
[11]	王淳颖, 张驯, 赵金雄, 袁晖, 李方军, 赵博, 朱小琴, 杨凡, 吕世超. 基于多源告警的攻击事件分析[J]. 计算机应用, 2020, 40(1): 123-128.
[12]	李博, 张晓, 颜靖艺, 李可威, 李恒, 凌玉龙, 张勇. 基于值差度量和聚类优化的K最近邻算法在银行客户行为预测中的应用[J]. 计算机应用, 2019, 39(9): 2784-2788.
[13]	纪丽娜, 陈凯, 于彦伟, 宋鹏, 王淑莹, 王成锐. 基于城市交通大数据的车辆类别挖掘及应用分析[J]. 计算机应用, 2019, 39(5): 1343-1350.
[14]	于永斌, 戚敏惠, 尼玛扎西, 王琳. 基于阈值自适应忆阻器Hopfield神经网络的关联规则挖掘算法[J]. 计算机应用, 2019, 39(3): 728-733.
[15]	叶志宇, 冯爱民, 高航. 基于深度LightGBM集成学习模型的谷歌商店顾客购买力预测[J]. 计算机应用, 2019, 39(12): 3434-3439.