计算机应用 ›› 2014, Vol. 34 ›› Issue (2): 396-400.

• 数据技术 • 上一篇    下一篇

基于垂直频繁模式树带有负载均衡的分布关联规则挖掘算法

冯勇,尹洁娜,徐红艳   

  1. 辽宁大学 信息学院,沈阳 110036
  • 收稿日期:2013-08-09 修回日期:2013-10-18 出版日期:2014-02-01 发布日期:2014-03-01
  • 通讯作者: 徐红艳
  • 作者简介:冯勇(1973-),男,辽宁沈阳人,副教授,博士,主要研究方向:数据挖掘、商务智能;尹洁娜(1988-),女,吉林榆树人,硕士研究生,主要研究方向:分布式关联规则挖掘;徐红艳(1972-),女,辽宁丹东人,副教授,硕士,主要研究方向:Web挖掘、数据管理。
  • 基金资助:
    教育部人文社会科学研究青年基金

Distributed rules mining algorithm with load balance based on vertical FP-tree

FENG Yong,YIN Jiena,XU Hongyan   

  1. School of Information, Liaoning University, Shenyang Liaoning 110036, China
  • Received:2013-08-09 Revised:2013-10-18 Online:2014-02-01 Published:2014-03-01
  • Contact: XU Hongyan

摘要: 大数据时代,开展面向海量、分布数据的知识发现研究成为学界和业界关注的热点,而负载均衡问题是开发分布式挖掘算法必须考虑的重要因素之一。为此,提出了一种基于垂直频繁模式树带有负载均衡的分布关联规则挖掘算法,算法采用垂直频繁模式树存储项及其关联而无需对局部挖掘结果进行合并,减少了通信量,简化了处理流程。同时所提出的算法采用混合体系结构即中心站点按照局部站点的处理能力分配任务,实现了负载均衡,提升了算法的性能。实验结果表明所提算法切实可行并具有较高效率。

关键词: 关联规则挖掘, 分布式, 垂直频繁模式, 负载均衡, 序列化

Abstract: In mass data era, the research on knowledge discovery of massive and distributed data has become the hot spot in both academic field and industry. The problem of load balance is one of the important factors that must be considered in developing a distributed mining algorithm. Therefore, a distributed association rules mining algorithm with load balance based on vertical FP-tree (VFP-LBDM) was proposed in this paper. Vertical frequent pattern tree was used in this algorithm to store items and their associations, and there was no need to combine the local mining results. Therefore, the communication cost was reduced and the processing procedure was also simplified. At the same time, the algorithm used the hybrid architecture in which the central site assigned tasks according to the processing capacity of each local site. It realized the load balance and improved the performance of the algorithm. The experiment shows that the algorithm given in this paper is feasible and has higher efficiency.

Key words: association rules mining, distribution, Vertical Frequent Pattern (VFP), load balance, serialization

中图分类号: