Maximal frequent itemset mining algorithm based on DiffNodeset structure

doi:10.11772/j.issn.1001-9081.2018040913

Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (12): 3438-3443.DOI: 10.11772/j.issn.1001-9081.2018040913

Previous Articles Next Articles

Maximal frequent itemset mining algorithm based on DiffNodeset structure

YIN Yuan^1,2, ZHANG Chang¹, WEN Kai^1,3, ZHENG Yunjun¹

1. Institute of Applied Communication Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
2. Chongqing Branch, China Telecom Company Limited, Chongqing 401121, China;
3. Chongqing Information Technology Designing Company Limited, Chongqing 401121, China

Received:2018-04-23 Revised:2018-07-20 Online:2018-12-10 Published:2018-12-15
Contact: 张昌

基于DiffNodeset结构的最大频繁项集挖掘算法

尹远^1,2, 张昌¹, 文凯^1,3, 郑云俊¹

1. 重庆邮电大学通信新技术应用研究中心, 重庆 400065;
2. 中国电信股份有限公司重庆分公司, 重庆 401121;
3. 重庆信科设计有限公司, 重庆 401121

通讯作者: 张昌
作者简介:尹远(1963-),男,重庆人,高级工程师,硕士,主要研究方向:移动通信、大数据;张昌(1993-),男,湖北孝感人,硕士研究生,主要研究方向:数据挖掘、大数据、云计算;文凯(1972-),男,重庆人,正高级工程师,博士,主要研究方向:移动通信、大数据;郑云俊(1992-),男,重庆人,硕士研究生,主要研究方向:大数据、推荐系统。

Abstract

Abstract: In data mining, mining maximum frequent itemsets instead of mining frequent itemsets can greatly improve the operating efficiency of system. The running time consumption of existing maximum frequent itemset mining algorithms is still very large. In order to solve the problem, a new DiffNodeset Maxmal Frequent Itemset Mining (DNMFIM) algorithm was proposed. Firstly, a new data structure DiffNodeset was adopted to realize the fast calculation of intersection and support degree. Secondly, the connection method with linear time complexity was adopted to reduce the complexity of connecting two DiffNodesets and avoid multiple invalid calculations. Then, the set-enumeration tree was adopted as the search space, and a variety of optimal pruning strategies were used to reduce the search space. Finally, the superset detection technology used in the MAximal Frequent Itemset Algorithm (MAFIA) algorithm was adopted to improve the accuracy of algorithm effectively. The experimental results show that, DNMFIM algorithm outperforms MAFIA and N-list based MAFIA (NB-MAFIA) in terms of time efficiency. The proposed algorithm has a good performance when mining the maximal frequent itemsets in different types of datasets.

Key words: maximal frequent itemset mining, association rule, set-enumeration tree, optimized pruning, superset detection

摘要： 在数据挖掘中，通过挖掘最大频繁项集来代替挖掘频繁项集可以大大地提升系统的运行效率。针对现有的最大频繁项集挖掘算法的运行时间消耗仍然很大的问题，提出了一种基于DiffNodeset结构的最大频繁项集挖掘（DNMFIM）算法。首先，采用了一种新的数据结构DiffNodeset来实现求交集以及支持度的快速计算；其次，引入一种新的线性复杂度的连接方法来降低两个DiffNodeset在连接过程中的复杂度，避免了多次的无效计算；然后，将集合枚举树作为搜索空间，同时采用多种优化剪枝策略来缩小搜索空间；最后，再结合最大频繁项集挖掘算法（MAFIA）中所使用的超集检测技术来有效地提高算法的准确性。实验结果表明，DNMFIM算法在时间效率方面性能优于MAFIA与基于N-list的MAFIA（NB-MAFIA），该算法在不同类型数据集中进行最大频繁项集挖掘时均有良好的效果。

关键词: 最大频繁项集挖掘, 关联规则, 集合枚举树, 优化剪枝, 超集检测

CLC Number:

TP311.13

YIN Yuan, ZHANG Chang, WEN Kai, ZHENG Yunjun. Maximal frequent itemset mining algorithm based on DiffNodeset structure[J]. Journal of Computer Applications, 2018, 38(12): 3438-3443.

尹远, 张昌, 文凯, 郑云俊. 基于DiffNodeset结构的最大频繁项集挖掘算法[J]. 计算机应用, 2018, 38(12): 3438-3443.

References

[1] AGRAWAL R, SRIKANT R. Fast algorithm for mining association rules[J]. Journal of Computer Science & Technology, 1994, 15(6):619-624.
[2] HAN J W, PEI J, YIN Y W, et al. Mining frequent patterns without candidate generation:a frequent-pattern tree approach[J]. Data Mining & Knowledge Discovery, 2004, 8(1):53-87.
[3] BAYARDO R J. Efficiently mining long patterns from databases[C]//Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York:ACM, 1998:85-93.
[4] AGARWAL R C, AGGARWAL C C, PRASAD V V V. Depth first generation of long patterns[C]//Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2000:108-118.
[5] BURDICK D, CALIMLIM M, FLANNICK J, et al. MAFIA:a maximal frequent itemset algorithm[J]. IEEE Transactions on Knowledge & Data Engineering, 2005, 17(11):1490-1504.
[6] ZOU Q H, CHU W W, LU B J. SmartMiner:a depth first algorithm guided by tail information for mining maximal frequent itemsets[C]//Proceedings of the 2002 IEEE International Conference on Data Mining. Piscataway, NJ:IEEE, 2002:570-577.
[7] GOUDA K, ZAKI M J. Efficiently mining maximal frequent itemsets[C]//Proceedings of the 2001 IEEE International Conference on Data Mining. Piscataway, NJ:IEEE, 2002:163-170.
[8] 沈戈晖,刘沛东,邓志鸿.NB-MAFIA:基于N-List的最长频繁项集挖掘算法[J].北京大学学报(自然科学版),2016,52(2):199-209.(SHEN G H, LIU P D, DENG Z H. NB-MAFIA:an N-List based maximal frequent itemset algorithm[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(2):199-209.)
[9] DENG Z H, WANG Z H, JIANG J J. A new algorithm for fast mining frequent itemsets using N-lists[J]. Science China Information Sciences, 2012, 55(9):2008-2030.
[10] DENG Z H. DiffNodesets:an efficient structure for fast mining frequent itemsets[J]. Applied Soft Computing, 2016, 41:214-223.
[11] DENG Z H, WANG Z H. A new fast vertical method for mining frequent patterns[J]. International Journal of Computational Intelligence Systems, 2010, 3(6):733-744.
[12] DENG Z H, LYU S L. Fast mining frequent itemsets using Nodesets[J]. Expert Systems with Applications, 2014, 41(10):4505-4512.
[13] 张昌,文凯,郑云俊.基于B-list的最大频繁项集挖掘算法[J/OL].计算机应用研究,2019,36(2)[2018-03-06].http://www.arocmag.com/article/02-2019-02-027.html.(ZHANG C, WEN K, ZHENG Y J. Maximal frequent itemset mining algorithm based B-list[J/OL]. Application Research of Computers, 2019, 36(2)[2018-03-06]. http://www.arocmag.com/article/02-2019-02-027.html.)
[14] 林晨.频繁项集挖掘算法及其基于Spark的并行化研究[D].上海:华东师范大学,2016:38-39.(LIN C. Research on frequent itemset mining algorithm and its parallelization based on Spark[D]. Shanghai:East China Normal University, 2016:38-39.)

[1]	WANG Chunying, ZHANG Xun, ZHAO Jinxiong, YUAN Hui, LI Fangjun, ZHAO Bo, ZHU Xiaoqin, YANG Fan, LYU Shichao. Analysis of attack events based on multi-source alerts [J]. Journal of Computer Applications, 2020, 40(1): 123-128.
[2]	YU Yongbin, QI Minhui, Nyima Tashi, WANG Lin. Association rule mining algorithm for Hopfield neural network based on threshold adaptive memristor [J]. Journal of Computer Applications, 2019, 39(3): 728-733.
[3]	CHEN Liu, FENG Shan. Two-level confidence threshold setting method for positive and negative association rules [J]. Journal of Computer Applications, 2018, 38(5): 1315-1319.
[4]	GU Junhua, WU Junyan, XU Xinyun, XIE Zhijian, ZHANG Suqi. Optimization and implementation of parallel FP-Growth algorithm based on Spark [J]. Journal of Computer Applications, 2018, 38(11): 3069-3074.
[5]	WANG Shuai, YANG Qiuhui, ZENG Jiayan, WAN Ying, FAN Zhening, ZHANG Guanglan. Application of weighted incremental association rule mining in communication alarm prediction [J]. Journal of Computer Applications, 2018, 38(10): 2875-2880.
[6]	TAN Zheng, LIU JingLei, YU Hang. Conditional preference mining based on MaxClique [J]. Journal of Computer Applications, 2017, 37(11): 3107-3114.
[7]	WANG Tao, QIN Xizhong, JIA Zhenhong, NIU Hongmei, CAO Chuanling. Association rules recommendation of microblog friend based on similarity and trust [J]. Journal of Computer Applications, 2016, 36(8): 2262-2267.
[8]	XU Kaiyong, GONG Xuerong, CHENG Maocai. Audit log association rule mining based on improved Apriori algorithm [J]. Journal of Computer Applications, 2016, 36(7): 1847-1851.
[9]	XU Yongxiu, LIU Xumin, XU Weixiang. Improved frequent itemset mining algorithm based on interval list [J]. Journal of Computer Applications, 2016, 36(4): 997-1001.
[10]	YANG Pengkun, PENG Hui, ZHOU Xiaofeng, SUN Yuqing. FP-MFIA: improved algorithm for mining maximum frequent itemsets based on frequent-pattern tree [J]. Journal of Computer Applications, 2015, 35(3): 775-778.
[11]	HU Shiyu, LUO Diansheng, YANG Shuang, YANG Jingwei. Load forecasting based on multi-variable LS-SVM and fuzzy recursive inference system [J]. Journal of Computer Applications, 2015, 35(2): 595-600.
[12]	FENG Yong YIN Jiena XU Hongyan. Distributed rules mining algorithm with load balance based on vertical FP-tree [J]. Journal of Computer Applications, 2014, 34(2): 396-400.
[13]	DONG Lin SHU Hong. Incremental maintenance of discovered spatial association rules [J]. Journal of Computer Applications, 2013, 33(11): 3049-3051.
[14]	ZHANG Chunsheng ZHUANG Liyan. Deductive method of association rules among compatible datasets based on Apriori [J]. Journal of Computer Applications, 2013, 33(10): 2796-2800.
[15]	GUO Xiaobo ZHAO Shuliang ZHAO Jiaojiao LIU Jundan. Visualization of multi-valued attribute association rules based on concept lattice [J]. Journal of Computer Applications, 2013, 33(08): 2198-2203.

Maximal frequent itemset mining algorithm based on DiffNodeset structure

基于DiffNodeset结构的最大频繁项集挖掘算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics