基于效用表的快速高平均效用挖掘算法

doi:10.11772/j.issn.1001-9081.2016.11.3062

计算机应用 ›› 2016, Vol. 36 ›› Issue (11): 3062-3066.DOI: 10.11772/j.issn.1001-9081.2016.11.3062

基于效用表的快速高平均效用挖掘算法

王敬华, 罗相洲, 吴倩

华中师范大学计算机学院, 武汉 430079

收稿日期:2016-05-16 修回日期:2016-06-20 发布日期:2016-11-12 出版日期:2016-11-10
通讯作者: 罗相洲
作者简介:王敬华(1965-),男,湖北红安人,副教授,硕士,主要研究方向:数据挖掘、现代信息系统;罗相洲(1991-),男,湖北武汉人,硕士研究生,主要研究方向:数据库、数据挖掘;吴倩(1990-),女,湖北汉川人,硕士研究生,主要研究方向:数据挖掘、复杂网络。
基金资助:
国家自然科学基金资助项目（61370108）。

Fast high average-utility itemset mining algorithm based on utility-list structure

WANG Jinghua, LUO Xiangzhou, WU Qian

School of Computer, Central China Normal University, Wuhan Hubei 430079, China

Received:2016-05-16 Revised:2016-06-20 Online:2016-11-12 Published:2016-11-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61370108).

摘要/Abstract

摘要： 高效用项集挖掘在数据挖掘领域中受到了广泛的关注，但是高效用项集挖掘并没有考虑项集长度对效用值的影响，所以高平均效用项集挖掘被提出；而目前的一些高平均效用项集挖掘算法需要耗费大量的时间才能挖掘出有效的高平均效用项集。针对此问题，给出了一个高平均效用项集挖掘的改进算法——FHAUI。FHAUI算法将效用信息保存到效用列表中，通过效用列表的比较来挖掘出所有的高平均效用值，同时FHAUI算法还采用了一个二维矩阵来有效减少二项效用值的连接比较次数。最后将FHAUI算法在多个经典的数据集上测试。实验结果表明，FHAUI算法在效用列表的连接比较次数上有了极大的降低，同时其时间性能也有非常大提高。

关键词: 平均效用, 高效用, 模式挖掘, 数据挖掘, 频繁模式

Abstract: In the field of data mining, high utility itemset mining has been widely studied. However, high utility itemset mining does not consider the effect of the itemset length. To address this issue, high average-utility itemset mining has been proposed. At present, the proposed high average utility itemset mining algorithms take a lot of time to dig out the high average-utility itemset. To solve this problem, an improved high average itemset mining algorithm, named FHAUI (Fast High Average Utility Itemset), was proposed. FHAUI stored the utility information in the utility-list and mined all the high average-utility itemsets from the utility-list structure. At the same time, FHAUI adopted a two-dimensional matrix to effectively reduce the number of join-operations. Finally, the experimental results on several classical datasets show that FHAUI has greatly reduced the number of join-operations, and reduced its cost in time consumption.

Key words: average utility, high utility, pattern mining, data mining, frequent pattern

中图分类号:

TP311.13

王敬华, 罗相洲, 吴倩. 基于效用表的快速高平均效用挖掘算法[J]. 计算机应用, 2016, 36(11): 3062-3066.

WANG Jinghua, LUO Xiangzhou, WU Qian. Fast high average-utility itemset mining algorithm based on utility-list structure[J]. Journal of Computer Applications, 2016, 36(11): 3062-3066.

参考文献

[1] AGRAWAL R, IMIELINSKI T, SWAMI A. Mining association rules between sets of items in large databases[J]. ACM SIGMOD Record, 1993, 22(2):207-216.
[2] HAN J W, PEI J, YIN Y W. Mining frequent patterns without candidate generation[J].ACM SIGMOD Record, 2000, 29(2):1-12.
[3] AGRAWAL R, SRIKANT R. Fast algorithms for mining association rules in large databases[C]//Proceedings of the 20th International Conference on Very Large Data Bases. San Francisco, CA:Morgan Kaufmann Publishers Inc., 1994:487-499.
[4] 李也白, 唐辉, 张淳,等. 基于改进的FP-tree的频繁模式挖掘算法[J]. 计算机应用, 2011, 31(1):101-103. (LI Y B, TANG H, ZHANG C, et. al. Frequent pattern mining algorithm based on improved FP-tree[J]. Journal of Computer Applications, 2011, 31(1):101-103.)
[5] LIU M, QU J. Mining high utility itemsets without candidate generation[C]//Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York:ACM, 2012:55-64.
[6] FOURNIER-VIGER P, WU C W, ZIDA S, et al. FHM:faster high-utility itemset mining using estimated utility co-occurrence pruning[C]//Proceedings of the 21st International Symposium Foundations of Intelligent Systems. Berlin:Springer, 2014:83-92.
[7] KRISHNAMOORTHY S. Pruning strategies for mining high utility itemsets[J]. Expert Systems with Applications, 2015, 42(5):2371-2381.
[8] YAO H, HAMILTON H J, BUTZ C J. A foundational approach to mining itemset utilities from databases[C]//Proceedings of the 2004 SIAM International Conference on Data Mining. Philadelphia, PA:SIAM, 2004, 4:215-221.
[9] LIU Y, LIAO W, CHOUDHARY A. A two-phase algorithm for fast discovery of high utility itemsets[C]//Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Heidelberg:Springer-Verlag, 2005:689-695.
[10] TSENG V S, SHIE B E, WU C W, et al. Efficient algorithms for mining high utility itemsets from transactional databases[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8):1772-1786.
[11] TSENG V S, WU C W, SHIE B E, et al. UP-Growth:an efficient algorithm for high utility itemset mining[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2010:253-262.
[12] 祝孔涛, 李兴建, 王乐. 高效用项集挖掘算法[J]. 计算机工程与设计, 2013, 34(12):4220-4225.(ZHU K T, LI X J, WANG L. Improved algorithm for mining high utility itemsets[J]. Computer Engineering and Design, 2013, 34(12):4220-4225.)
[13] HONG T P, LEE C H, WANG S L. Effective utility mining with the measure of average utility[J]. Expert Systems with Applications, 2011, 38(7):8259-8265.
[14] LIN C W, HONG T P, LU W H. Efficiently mining high average utility itemsets with a tree structure[M]//Proceedings of the 2nd International Conference on Intelligent Information and Database Systems. Berlin:Springer-Verlag, 2010:131-139.
[15] LAN G C, HONG T P, TSENG V S. A projection-based approach for discovering high average-utility itemsets[J]. Journal of Information Science and Engineering, 2012, 28(1):193-209.
[16] LU T, VO B, NGUYEN H T, et al. A new method for mining high average utility itemsets[C]//Proceedings of the 13th IFIP TC8 International Conference on Computer Information Systems and Industrial Management. Berlin:Springer, 2014:33-42.
[17] LIN J C W, LI T, FOURNIER-VIGER P, et al. An efficient algorithm to mine high average-utility itemsets[J]. Advanced Engineering Informatics, 2016, 30(2):233-243.

基于效用表的快速高平均效用挖掘算法

Fast high average-utility itemset mining algorithm based on utility-list structure

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[2]	董瑶, 付怡雪, 董永峰, 史进, 陈晨. 不完整多视图聚类综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1673-1682.
[3]	杨克帅, 武优西, 耿萌, 刘靖宇, 李艳. 一次性条件下top-k高平均效用序列模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 477-484.
[4]	郑浩东, 马华, 谢颖超, 唐文胜. 融合遗忘因素与记忆门的图神经网络知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2747-2752.
[5]	单芝慧, 韩萌, 韩强. 增量数据上的闭合定量高效用项集挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2049-2056.
[6]	黄硕, 李艳辉, 曹建秋. 本地化差分隐私下的频繁序列模式挖掘算法PrivSPM[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2057-2064.
[7]	蒋华, 李星, 王慧娇, 韦静海. 基于数据索引结构的跨级高效用项集挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2200-2208.
[8]	祁超帅, 何文思, 焦毅, 马英红, 蔡伟, 任素萍. 无人机飞行数据异常检测算法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1833-1841.
[9]	高智慧, 韩萌, 刘淑娟, 李昂, 穆栋梁. 基于智能优化算法的高效用项集挖掘方法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1676-1686.
[10]	李元江, 权金升, 谭阳奕, 杨田. 基于相似和差异双视角的高维数据属性约简[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1467-1472.
[11]	袁泉, 唐成亮, 徐雲鹏. 基于长度约束的蝙蝠高效用项集挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1473-1480.
[12]	邵小萌, 张猛. 融合注意力机制的时间卷积知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 343-348.
[13]	尹春勇, 李荧. 基于BCU-Tree与字典的高效用挖掘快速脱敏算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 413-422.
[14]	钟新成, 刘昶, 赵秀梅. 基于马尔可夫优化的高效用项集挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3764-3771.
[15]	李文全, 毛伊敏, 彭新东. 基于犹豫模糊集的凝聚式层次聚类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3755-3763.