Improved algorithm for mining collaborative frequent itemsets in multiple data streams

doi:10.11772/j.issn.1001-9081.2016.07.1988

Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (7): 1988-1992.DOI: 10.11772/j.issn.1001-9081.2016.07.1988

Previous Articles Next Articles

Improved algorithm for mining collaborative frequent itemsets in multiple data streams

WANG Xin^1,2, LIU Fang'ai^1,2

1. College of Information Science and Engineering, Shandong Normal University, Jinan Shandong 250014, China;
2. Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology (Shandong Normal University), Jinan Shandong 250014, China

Received:2015-12-02 Revised:2016-03-01 Online:2016-07-10 Published:2016-07-14
Supported by:
This work is partially supported by National Natural Science Foundation of China (61572301, 90612003), Natural Science Foundation of Shandong Province (ZR2013FM008).

改进的多数据流协同频繁项集挖掘算法

王鑫^1,2, 刘方爱^1,2

1. 山东师范大学信息科学与工程学院, 济南 250014;
2. 山东省分布式计算机软件新技术重点实验室(山东师范大学), 济南 250014

通讯作者: 刘方爱
作者简介:王鑫(1992-),女,山东德州人,硕士研究生,CCF会员,主要研究方向:数据挖掘、大数据分析;刘方爱(1962-),男,山东青岛人,教授,博士生导师,博士,主要研究方向:数据挖掘、大数据分析、分布式计算。
基金资助:
国家自然科学基金资助项目（61572301，90612003）；山东省自然科学基金资助项目（ZR2013FM008）。

Abstract

Abstract: In view of low memory usage rate and inefficient discovery for mining frequent itemsets in multiple data streams, an improved Mining Collaborative frequent itemsets in Multiple Data Stream (MCMD-Stream) algorithm was proposed. Firstly, the window sliding based on bit-sequence technique was utilized, which was a single-pass algorithm to find the potential and frequent itemsets. Secondly, Compressed frequent Pattern Tree (CP-Tree), which is similar to Frequent Pattern Tree (FP-Tree), was constructed to store the potential and frequent itemsets. And each node in the CP-Tree could generate the logarithmic tilted window to save the counts of frequent itemsets. Finally, the valuable frequent itemsets that appeared repeatedly in multiple data streams, namely collaborative frequent itemsets, were got. Compared to A-Stream and H-Stream algorithms, MCMD-Stream algorithm can improve the mining efficiency of collaborative frequent itemsets in multiple data streams, and also reduce the usage of the memory space. The experimental results show that MCMD-Stream algorithm can efficiently be applied to mine the collaborative frequent itemsets in multiple data streams.

Key words: stream data mining, multiple data stream, sliding window, frequent itemset, collaborative frequent itemset

摘要： 针对已有的多数据流协同频繁项集挖掘算法存在内存占用率高以及发现频繁项集效率低的问题，提出了改进的多数据流协同频繁项集挖掘（MCMD-Stream）算法。首先，该算法利用单遍扫描数据库的字节序列滑动窗口挖掘算法发现数据流中的潜在频繁项集和频繁项集；其次，构建类似频繁模式树（FP-Tree）的压缩频繁模式树（CP-Tree）存储已发现的潜在频繁项集和频繁项集，同时更新CP-Tree树中每个节点生成的对数倾斜时间表中的频繁项计数；最后，通过汇总分析得出在多条数据流中多次出现的且有价值的频繁项集，即协同频繁项集。相比A-Stream和H-Stream算法，MCMD-Stream算法不仅能够提高多数据流中协同频繁项集挖掘的效率，并且还降低了内存空间的使用率。实验结果表明MCMD-Stream算法能够有效地应用于多数据流的协同频繁项集挖掘。

关键词: 流数据挖掘, 多数据流, 滑动窗口, 频繁项集, 协同频繁项集

CLC Number:

TP301.6

WANG Xin, LIU Fang'ai. Improved algorithm for mining collaborative frequent itemsets in multiple data streams[J]. Journal of Computer Applications, 2016, 36(7): 1988-1992.

王鑫, 刘方爱. 改进的多数据流协同频繁项集挖掘算法[J]. 计算机应用, 2016, 36(7): 1988-1992.

References

[1] BHATTACHARYA S, MOON S. Network performance monitoring and measurement:techniques and experience[M]//Universal Access in Human-Computer Interaction. Access to Interaction. Berlin:Springer, 2015:528-537.
[2] 王爽,王国仁.面向不确定感知数据的频繁项查询算法[J].计算机学报,2013,36(3):571-581.(WANG S, WANG G R. Frequent items query algorithm for uncertain sensing data[J]. Chinese Journal of Computers, 2013, 36(3):571-581.)
[3] 金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181.(JIN C Q, QIAN W N, ZHOU A Y. Analysis and management of streaming data[J]. Journal of Software, 2004, 15(8):1172-1181.
[4] HENZINGER M R, RAGHAVAN P, RAJAGOPALAN S. Computing on data streams[C]//Proceedings of the 1999 External Memory Algorithms:DIMACS Workshop External Memory and Visualization. Boston:American Mathematical Society, 1999:107-118.
[5] MANKU G, MOTWANI R. Approximate frequency counts over data streams[C]//Proceedings of the 28th International Conference on Very Large Data Bases.[S.l.]:VLDB Endowment, 2002:346-357.
[6] YU J, CHONG Z, LU H, et al. False positive or false negative:mining frequent itemsets from high speed transactional data streams[C]//Proceedings of the Thirtieth International Conference on Very Large Data Bases.[S.l.]:VLDB Endowment, 2004:204-215.
[7] MOZAFARI B, THANKKAR H, ZANIOLO C. Verifying and mining frequent patterns from large windows over data streams[C]//ICDE 2008:Proceedings of the IEEE 24th International Conference on Data Engineering. Piscataway, NJ:IEEE, 2008:179-188.
[8] LEUNG C K S, KHAN Q. DSTree:a tree structure for the mining of frequent sets from data streams[C]//ICDM'06:Proceedings of the 2006 Sixth International Conference on Data Mining. Piscataway, NJ:IEEE, 2006:928-932.
[9] HRISTIDIS V, VALDIVIA O, VLACHOS M, et al. Information discovery across multiple streams[J]. Information Sciences, 2009, 179(19):3268-3285.
[10] YEH M Y, DAI B R, CHEN M S. Clustering over multiple evolving streams by events and correlations[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(10):1349-1362.
[11] GUO J, ZHANG P, TAN J, et al. Mining frequent patterns across multiple data streams[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. New York:ACM, 2011:2325-2328.
[12] HAN J, PEI J, YIN Y, et al. Mining frequent patterns without candidate generation:a frequent-pattern tree approach[J]. Data Mining and Knowledge Discovery, 2004, 8(1):53-87.
[13] CHANG J H, LEE W S. Finding recent frequent itemsets adaptively over online data streams[C]//Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2003:487-492.
[14] AGRAWAL R, SRIKANT R. Privacy-preserving data mining[C]//Proceedings of the 2009 ACM SIGMOD Record. New York:ACM, 2000, 29(2):439-450.
[15] BERNECKER T, CHENG R, CHEUNG D W, et al. Model-based probabilistic frequent itemset mining[J]. Knowledge and Information Systems, 2013, 37(1):181-217.

Improved algorithm for mining collaborative frequent itemsets in multiple data streams

改进的多数据流协同频繁项集挖掘算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	JING Xinghong, SUN Guodong, HE Shibiao, LIAO Yong. Time-varying channel estimation method based on sliding window filtering and polynomial fitting [J]. Journal of Computer Applications, 2021, 41(9): 2699-2704.
[2]	QIU Yuan, Chang Xiangmao, QIU Qian, PENG Cheng, SU Shanting. Stream data anomaly detection method based on long short-term memory network and sliding window [J]. Journal of Computer Applications, 2020, 40(5): 1335-1339.
[3]	YU Yongbin, QI Minhui, Nyima Tashi, WANG Lin. Association rule mining algorithm for Hopfield neural network based on threshold adaptive memristor [J]. Journal of Computer Applications, 2019, 39(3): 728-733.
[4]	YANG Shiqiang, LUO Xiaoyu, QIAO Dan, LIU Peilei, LI Dexin. Continuous action segmentation and recognition based on sliding window and dynamic programming [J]. Journal of Computer Applications, 2019, 39(2): 348-353.
[5]	DU Yuan, ZHANG Shiwei. Improved canonical-order tree algorithm based on restructure [J]. Journal of Computer Applications, 2019, 39(2): 441-445.
[6]	WANG Jixiang, GUO Yi, QI Tianmei, WANG Zhihong, LI Zhen, TANG Minwei. RMB exchange rate forecast embedded with Internet public opinion intensity [J]. Journal of Computer Applications, 2019, 39(11): 3403-3408.
[7]	LIU Zhanghu, CHENG Chunling. Variance reduced stochastic variational inference algorithm for topic modeling of large-scale data [J]. Journal of Computer Applications, 2018, 38(6): 1675-1681.
[8]	XIAO Wen, HU Juan. Performance analysis of frequent itemset mining algorithms based on sparseness of dataset [J]. Journal of Computer Applications, 2018, 38(4): 995-1000.
[9]	YIN Yuan, ZHANG Chang, WEN Kai, ZHENG Yunjun. Maximal frequent itemset mining algorithm based on DiffNodeset structure [J]. Journal of Computer Applications, 2018, 38(12): 3438-3443.
[10]	GU Junhua, WU Junyan, XU Xinyun, XIE Zhijian, ZHANG Suqi. Optimization and implementation of parallel FP-Growth algorithm based on Spark [J]. Journal of Computer Applications, 2018, 38(11): 3069-3074.
[11]	LI Xiaolin, DU Tuo, LIU Biao. Fast algorithm for mining frequent patterns based on B-list [J]. Journal of Computer Applications, 2017, 37(8): 2357-2361.
[12]	XU Yongxiu, LIU Xumin, XU Weixiang. Improved frequent itemset mining algorithm based on interval list [J]. Journal of Computer Applications, 2016, 36(4): 997-1001.
[13]	DUAN Fengfeng, WANG Yongbin, YANG Lifang, PAN Shujing. Feature extraction for stereoscopic vision depth map based on principal component analysis and histogram of oriented depth gradient [J]. Journal of Computer Applications, 2016, 36(1): 222-226.
[14]	WANG Le, CHANG Yanfeng, WANG Shui. Frequent pattern mining algorithm from uncertain data based on pattern-growth [J]. Journal of Computer Applications, 2015, 35(7): 1921-1926.
[15]	LIU Haoran, LIU Fang'ai, LI Xu, WANG Jiwei. Efficient mining algorithm for uncertain data in probabilistic frequent itemsets [J]. Journal of Computer Applications, 2015, 35(6): 1757-1761.