关联规则挖掘中Apriori算法的研究与改进

计算机应用 ›› 2010, Vol. 30 ›› Issue (11): 2952-2955.

关联规则挖掘中Apriori算法的研究与改进

崔贯勋¹,李梁²,王柯柯²,苟光磊²,邹航²

1. 重庆理工大学
2.

收稿日期:2010-05-18 修回日期:2010-07-12 发布日期:2010-11-05 出版日期:2010-11-01
通讯作者: 崔贯勋
基金资助:
教育部科学研究项目;重庆市科技攻关计划项目;重庆市科技攻关计划项目;重庆市自然科学基金计划项目

Research and improvement on Apriori algorithm of association rule mining

Received:2010-05-18 Revised:2010-07-12 Online:2010-11-05 Published:2010-11-01

摘要/Abstract

摘要： 经典的产生频繁项目集的Apriori算法存在多次扫描数据库可能产生大量候选及反复对候选项集和事务进行模式匹配的缺陷,导致了算法的效率较低。为此,对Apriori算法进行以下3方面的改进:改进由k阶频繁项集生成k+1阶候选频繁项集时的连接和剪枝策略;改进对事务的处理方式,减少Apriori算法中的模式匹配所需的时间开销;改进首次对数据库的处理方法,使得整个算法只扫描一次数据库,并由此提出了改进算法。实验结果表明,改进算法在性能上得到了明显提高。

关键词: 数据挖掘, 关联规则, Apriori算法, 频繁项集, 候选项集

Abstract: The classic Apriori algorithm for discovering frequent itemsets scans the database many times and the pattern matching between candidate itemsets and transactions is used repeatedly, so a large number of candidate itemsets were produced, which results in low efficiency of the algorithm. The improved Apriori algorithm improved it from three aspects: firstly, the strategy of the join step and the prune step was improved when candidate frequent (k+1)-itemsets were generated from frequent k-itemsets; secondly, the method of dealing with transaction was improved to reduce the time of pattern matching to be used in the Apriori algorithm; in the end, the method of dealing with database was improved, which lead to only once scanning of the database during the whole course of the algorithm. According to these improvements, an improved algorithm was introduced. The efficiency of Apriori algorithm got improvement both in time and in space. The experimental results of the improved algorithm show that the improved algorithm is more efficient than the original.

Key words: data mining, association rule, apriori algorithm, frequent itemsets, candidate itemsets

崔贯勋李梁王柯柯苟光磊邹航. 关联规则挖掘中Apriori算法的研究与改进[J]. 计算机应用, 2010, 30(11): 2952-2955.

[1]	刘世泽, 秦艳君, 王晨星, 苏琳, 柯其学, 罗海勇, 孙艺, 王宝会. 基于深度残差长短记忆网络交通流量预测算法[J]. 计算机应用, 2021, 41(6): 1566-1572.
[2]	李旭娟, 皮建勇, 黄飞翔, 贾海朋. 基于自生成深度神经网络的4D航迹预测[J]. 计算机应用, 2021, 41(5): 1492-1499.
[3]	陈凯, 于彦伟, 赵金东, 宋鹏. 基于城市交通监控大数据的工作位置推理方法[J]. 计算机应用, 2021, 41(1): 177-184.
[4]	龙洋洋, 陈玉玲, 辛阳, 豆慧. 基于联盟区块链的安全能源交易方案[J]. 计算机应用, 2020, 40(6): 1668-1673.
[5]	徐周波, 杨健, 刘华东, 黄文文. 基于XGBoost与拓扑结构信息的蛋白质复合物识别算法[J]. 计算机应用, 2020, 40(5): 1510-1514.
[6]	杜旭升, 于炯, 叶乐乐, 陈嘉颖. 基于图上随机游走的离群点检测算法[J]. 计算机应用, 2020, 40(5): 1322-1328.
[7]	陈曦, 梅广, 张金金, 许维胜. 融合知识图谱和协同过滤的学生成绩预测方法[J]. 计算机应用, 2020, 40(2): 595-601.
[8]	马董, 陈红梅, 王丽珍, 肖清. 空间亚频繁co-location模式的主导特征挖掘[J]. 计算机应用, 2020, 40(2): 465-472.
[9]	李莎莎, 梁冬阳, 余杰, 纪斌, 马俊, 谭郁松, 吴庆波. 基于师门关系的研究团队挖掘算法[J]. 计算机应用, 2020, 40(11): 3198-3202.
[10]	孙鹤立, 张优优, 杨洲, 何亮, 贾晓琳. 基于时间线段树的城市可达区域搜索[J]. 计算机应用, 2020, 40(10): 2936-2941.
[11]	王淳颖, 张驯, 赵金雄, 袁晖, 李方军, 赵博, 朱小琴, 杨凡, 吕世超. 基于多源告警的攻击事件分析[J]. 计算机应用, 2020, 40(1): 123-128.
[12]	李博, 张晓, 颜靖艺, 李可威, 李恒, 凌玉龙, 张勇. 基于值差度量和聚类优化的K最近邻算法在银行客户行为预测中的应用[J]. 计算机应用, 2019, 39(9): 2784-2788.
[13]	纪丽娜, 陈凯, 于彦伟, 宋鹏, 王淑莹, 王成锐. 基于城市交通大数据的车辆类别挖掘及应用分析[J]. 计算机应用, 2019, 39(5): 1343-1350.
[14]	于永斌, 戚敏惠, 尼玛扎西, 王琳. 基于阈值自适应忆阻器Hopfield神经网络的关联规则挖掘算法[J]. 计算机应用, 2019, 39(3): 728-733.
[15]	杜媛, 张世伟. 基于重构的改进自然排序树算法[J]. 计算机应用, 2019, 39(2): 441-445.