计算机应用 ›› 2014, Vol. 34 ›› Issue (8): 2175-2178.DOI: 10.11772/j.issn.1001-9081.2014.08.2175

• 第五届中国数据挖掘会议(CCDM 2014)论文 • 上一篇    下一篇

并行挖掘频繁项目集新算法——MREclat

章志刚,吉根林,唐梦梦   

  1. 南京师范大学 计算机科学与技术学院,南京210023
  • 收稿日期:2014-04-30 修回日期:2014-05-10 出版日期:2014-08-01 发布日期:2014-08-10
  • 通讯作者: 吉根林
  • 作者简介:章志刚(1988-),男,江苏海安人,硕士研究生,主要研究方向:云计算、数据挖掘;吉根林(1964-),男,江苏海安人,教授,博士生导师,博士,CCF高级会员,主要研究方向:数据挖掘;唐梦梦(1990-),女,江苏连云港人,硕士研究生,主要研究方向:空间轨迹挖掘。
  • 基金资助:

    江苏省自然科学基金资助项目

MREclat: new algorithm for parallel mining frequent itemsets

ZHANG Zhigang,JI Genlin,TANG Mengmeng   

  1. School of Computer Science and Technology, Nanjing Normal University, Nanjing Jiangsu 210023, China
  • Received:2014-04-30 Revised:2014-05-10 Online:2014-08-01 Published:2014-08-10
  • Contact: JI Genlin

摘要:

针对Eclat算法在挖掘海量数据中的频繁项目集时存在的内存和计算资源不足等问题,提出了基于Map/Reduce计算模型的并行挖掘算法——MREclat。首先,将水平型数据库转换成垂直型数据库;然后,将转换后的数据按2-项集的前缀分发到各个计算节点上,且在分发数据时引入了均衡策略;接着,在各个计算节点上求出以某一前缀开头的所有频繁项目集;最后,合并各个节点的结果得到所有频繁项目集。介绍了MREclat的设计思想,研究了算法的运行性能。实验结果表明,MREclat算法效率大约是PEclat算法的2倍,加速比性能比PEclat算法提高了64%。

Abstract:

Aiming at the problem that the memory and computational capability is insufficient while using Eclat algorithm to mine frequent itemsets from massive dataset, a parallel mining algorithm based on Map/Reduce framework, called MREclat(MapReduce Eclat), was proposed. Firstly, MREclat algorithm converted the horizontal database into a vertical one. Secondly, it redistributed the converted dataset according to the first item of each frequent 2-itemset and load-balance was taken into consideration while distributing datasets. Then, all the frequent itemsets prefixed by the same item were computed in each computing node. Finally, MREclat algorithm collected the result of each computing node and generated the whole frequent itemsets. In this paper, the idea of MREclat was introduced and the performance of the algorithm was studied. The experimental results show that MREclat algorithm is twice as efficient as PEclat algorithm, and the speedup performance of MREclat algorithm is 64% higher than that of PEclat.

中图分类号: