Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (7): 1988-1992.DOI: 10.11772/j.issn.1001-9081.2016.07.1988

Previous Articles     Next Articles

Improved algorithm for mining collaborative frequent itemsets in multiple data streams

WANG Xin1,2, LIU Fang'ai1,2   

  1. 1. College of Information Science and Engineering, Shandong Normal University, Jinan Shandong 250014, China;
    2. Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology (Shandong Normal University), Jinan Shandong 250014, China
  • Received:2015-12-02 Revised:2016-03-01 Online:2016-07-10 Published:2016-07-14
  • Supported by:
    This work is partially supported by National Natural Science Foundation of China (61572301, 90612003), Natural Science Foundation of Shandong Province (ZR2013FM008).

改进的多数据流协同频繁项集挖掘算法

王鑫1,2, 刘方爱1,2   

  1. 1. 山东师范大学 信息科学与工程学院, 济南 250014;
    2. 山东省分布式计算机软件新技术重点实验室(山东师范大学), 济南 250014
  • 通讯作者: 刘方爱
  • 作者简介:王鑫(1992-),女,山东德州人,硕士研究生,CCF会员,主要研究方向:数据挖掘、大数据分析;刘方爱(1962-),男,山东青岛人,教授,博士生导师,博士,主要研究方向:数据挖掘、大数据分析、分布式计算。
  • 基金资助:
    国家自然科学基金资助项目(61572301,90612003);山东省自然科学基金资助项目(ZR2013FM008)。

Abstract: In view of low memory usage rate and inefficient discovery for mining frequent itemsets in multiple data streams, an improved Mining Collaborative frequent itemsets in Multiple Data Stream (MCMD-Stream) algorithm was proposed. Firstly, the window sliding based on bit-sequence technique was utilized, which was a single-pass algorithm to find the potential and frequent itemsets. Secondly, Compressed frequent Pattern Tree (CP-Tree), which is similar to Frequent Pattern Tree (FP-Tree), was constructed to store the potential and frequent itemsets. And each node in the CP-Tree could generate the logarithmic tilted window to save the counts of frequent itemsets. Finally, the valuable frequent itemsets that appeared repeatedly in multiple data streams, namely collaborative frequent itemsets, were got. Compared to A-Stream and H-Stream algorithms, MCMD-Stream algorithm can improve the mining efficiency of collaborative frequent itemsets in multiple data streams, and also reduce the usage of the memory space. The experimental results show that MCMD-Stream algorithm can efficiently be applied to mine the collaborative frequent itemsets in multiple data streams.

Key words: stream data mining, multiple data stream, sliding window, frequent itemset, collaborative frequent itemset

摘要: 针对已有的多数据流协同频繁项集挖掘算法存在内存占用率高以及发现频繁项集效率低的问题,提出了改进的多数据流协同频繁项集挖掘(MCMD-Stream)算法。首先,该算法利用单遍扫描数据库的字节序列滑动窗口挖掘算法发现数据流中的潜在频繁项集和频繁项集;其次,构建类似频繁模式树(FP-Tree)的压缩频繁模式树(CP-Tree)存储已发现的潜在频繁项集和频繁项集,同时更新CP-Tree树中每个节点生成的对数倾斜时间表中的频繁项计数;最后,通过汇总分析得出在多条数据流中多次出现的且有价值的频繁项集,即协同频繁项集。相比A-Stream和H-Stream算法,MCMD-Stream算法不仅能够提高多数据流中协同频繁项集挖掘的效率,并且还降低了内存空间的使用率。实验结果表明MCMD-Stream算法能够有效地应用于多数据流的协同频繁项集挖掘。

关键词: 流数据挖掘, 多数据流, 滑动窗口, 频繁项集, 协同频繁项集

CLC Number: