计算机应用 ›› 2010, Vol. 30 ›› Issue (11): 2952-2955.

• 数据库与数据挖掘 • 上一篇    下一篇

关联规则挖掘中Apriori算法的研究与改进

崔贯勋1,李梁2,王柯柯2,苟光磊2,邹航2   

  1. 1. 重庆理工大学
    2.
  • 收稿日期:2010-05-18 修回日期:2010-07-12 发布日期:2010-11-05 出版日期:2010-11-01
  • 通讯作者: 崔贯勋
  • 基金资助:
    教育部科学研究项目;重庆市科技攻关计划项目;重庆市科技攻关计划项目;重庆市自然科学基金计划项目

Research and improvement on Apriori algorithm of association rule mining

  • Received:2010-05-18 Revised:2010-07-12 Online:2010-11-05 Published:2010-11-01

摘要: 经典的产生频繁项目集的Apriori算法存在多次扫描数据库可能产生大量候选及反复对候选项集和事务进行模式匹配的缺陷,导致了算法的效率较低。为此,对Apriori算法进行以下3方面的改进:改进由k阶频繁项集生成k+1阶候选频繁项集时的连接和剪枝策略;改进对事务的处理方式,减少Apriori算法中的模式匹配所需的时间开销;改进首次对数据库的处理方法,使得整个算法只扫描一次数据库,并由此提出了改进算法。实验结果表明,改进算法在性能上得到了明显提高。

关键词: 数据挖掘, 关联规则, Apriori算法, 频繁项集, 候选项集

Abstract: The classic Apriori algorithm for discovering frequent itemsets scans the database many times and the pattern matching between candidate itemsets and transactions is used repeatedly, so a large number of candidate itemsets were produced, which results in low efficiency of the algorithm. The improved Apriori algorithm improved it from three aspects: firstly, the strategy of the join step and the prune step was improved when candidate frequent (k+1)-itemsets were generated from frequent k-itemsets; secondly, the method of dealing with transaction was improved to reduce the time of pattern matching to be used in the Apriori algorithm; in the end, the method of dealing with database was improved, which lead to only once scanning of the database during the whole course of the algorithm. According to these improvements, an improved algorithm was introduced. The efficiency of Apriori algorithm got improvement both in time and in space. The experimental results of the improved algorithm show that the improved algorithm is more efficient than the original.

Key words: data mining, association rule, apriori algorithm, frequent itemsets, candidate itemsets