Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (8): 2357-2361.DOI: 10.11772/j.issn.1001-9081.2017.08.2357

Previous Articles     Next Articles

Fast algorithm for mining frequent patterns based on B-list

LI Xiaolin1,2, DU Tuo1, LIU Biao1   

  1. 1. Institute of Applied Communication Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
    2. Chongqing Information Technology Designing Company Limited, Chongqing 400065, China
  • Received:2016-12-13 Revised:2017-03-03 Online:2017-08-10 Published:2017-08-12
  • Supported by:
    This work is partially supported by the Chongqing Postgraduate Research Innovation Fund (CYS15166).

基于B-list的快速频繁模式挖掘算法

李校林1,2, 杜托1, 刘彪1   

  1. 1. 重庆邮电大学 通信新技术应用研究中心, 重庆 400065;
    2. 重庆信科设计有限公司, 重庆 400065
  • 作者简介:李校林(1968-),男,江西宁都人,正高级工程师,硕士,主要研究方向:移动通信、大数据;杜托(1993-),男,河南三门峡人,硕士研究生,主要研究方向:大数据、数据挖掘;刘彪(1991-),男,河南南阳人,硕士研究生,主要研究方向:分布式计算、数据挖掘。
  • 基金资助:
    重庆市研究生科研创新基金资助项目(CYS15166)。

Abstract: In order to solve the problems in the existing frequent pattern mining algorithms, such as complex tree building and low mining efficiency, a high-performance algorithm for mining frequent patterns was proposed, namely B-List Frequent Pattern Mining (BLFPM). A new data structure of Building list (B-list) was employed to represent frequent itemsets, and the frequent patterns were directly discovered by intersecting two B-lists without scanning the database. Aiming at the high time complexity of connecting two B-lists, a linear time complexity connection method was proposed to improve the time efficiency of BLFPM. Besides, set-enumeration search tree and an efficient pruning strategy were adopted to greatly reduce the search space and speed up the execution. Compared to N-list and Subsume-based algorithm for mining Frequent Itemsets (NSFI) and prepost algorithm, the time efficiency of BLFPM was improved by about 12% to 29%, and the space efficiency of BLFPM was improved by about 10% to 24%. The experimental results show that BLFPM has good performance for both sparse database and intensive database.

Key words: data mining, pattern mining, frequent itemset, Traversal when Building tree (TB-tree), Building list (B-list)

摘要: 针对现有的频繁模式挖掘算法存在建树复杂、挖掘效率低等问题,提出一种基于构造链表(B-list)的频繁模式挖掘(BLFPM)算法。BLFPM使用一种新的数据结构B-list表示频繁项集,通过连接两个k-1-频繁项集的B-list可以快速得到k-项集的支持度,避免了多次扫描数据库;针对连接两个B-list时间复杂度高的问题,给出了一种线性时间复杂度的连接方法,提高了BLFPM的时间效率;同时,BLFPM采用集合枚举树代表搜索空间,并使用子集非频繁剪枝策略,减小了频繁模式挖掘的搜索空间,提高了算法的执行速度。实验结果表明,与NSFI算法和prepost算法相比,BLFPM的时间效率提高约12%到29%,空间效率提高约10%到24%,对稀疏数据库或稠密数据库进行频繁模式挖掘均可以得到良好的效果。

关键词: 数据挖掘, 模式挖掘, 频繁项集, 遍历构造树, 构造链表

CLC Number: