计算机应用 ›› 2016, Vol. 36 ›› Issue (11): 3062-3066.DOI: 10.11772/j.issn.1001-9081.2016.11.3062

• 先进计算 • 上一篇    下一篇

基于效用表的快速高平均效用挖掘算法

王敬华, 罗相洲, 吴倩   

  1. 华中师范大学 计算机学院, 武汉 430079
  • 收稿日期:2016-05-16 修回日期:2016-06-20 出版日期:2016-11-10 发布日期:2016-11-12
  • 通讯作者: 罗相洲
  • 作者简介:王敬华(1965-),男,湖北红安人,副教授,硕士,主要研究方向:数据挖掘、现代信息系统;罗相洲(1991-),男,湖北武汉人,硕士研究生,主要研究方向:数据库、数据挖掘;吴倩(1990-),女,湖北汉川人,硕士研究生,主要研究方向:数据挖掘、复杂网络。
  • 基金资助:
    国家自然科学基金资助项目(61370108)。

Fast high average-utility itemset mining algorithm based on utility-list structure

WANG Jinghua, LUO Xiangzhou, WU Qian   

  1. School of Computer, Central China Normal University, Wuhan Hubei 430079, China
  • Received:2016-05-16 Revised:2016-06-20 Online:2016-11-10 Published:2016-11-12
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61370108).

摘要: 高效用项集挖掘在数据挖掘领域中受到了广泛的关注,但是高效用项集挖掘并没有考虑项集长度对效用值的影响,所以高平均效用项集挖掘被提出;而目前的一些高平均效用项集挖掘算法需要耗费大量的时间才能挖掘出有效的高平均效用项集。针对此问题,给出了一个高平均效用项集挖掘的改进算法——FHAUI。FHAUI算法将效用信息保存到效用列表中,通过效用列表的比较来挖掘出所有的高平均效用值,同时FHAUI算法还采用了一个二维矩阵来有效减少二项效用值的连接比较次数。最后将FHAUI算法在多个经典的数据集上测试。实验结果表明,FHAUI算法在效用列表的连接比较次数上有了极大的降低,同时其时间性能也有非常大提高。

关键词: 平均效用, 高效用, 模式挖掘, 数据挖掘, 频繁模式

Abstract: In the field of data mining, high utility itemset mining has been widely studied. However, high utility itemset mining does not consider the effect of the itemset length. To address this issue, high average-utility itemset mining has been proposed. At present, the proposed high average utility itemset mining algorithms take a lot of time to dig out the high average-utility itemset. To solve this problem, an improved high average itemset mining algorithm, named FHAUI (Fast High Average Utility Itemset), was proposed. FHAUI stored the utility information in the utility-list and mined all the high average-utility itemsets from the utility-list structure. At the same time, FHAUI adopted a two-dimensional matrix to effectively reduce the number of join-operations. Finally, the experimental results on several classical datasets show that FHAUI has greatly reduced the number of join-operations, and reduced its cost in time consumption.

Key words: average utility, high utility, pattern mining, data mining, frequent pattern

中图分类号: