Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (2): 441-445.DOI: 10.11772/j.issn.1001-9081.2018061328

Previous Articles     Next Articles

Improved canonical-order tree algorithm based on restructure

DU Yuan1, ZHANG Shiwei2   

  1. 1. College of Information Engineering, China Jiliang University, Hangzhou Zhejiang 310018, China;
    2. Modern Educational Technology Center, China Jiliang University, Hangzhou Zhejiang 310018, China
  • Received:2018-06-25 Revised:2018-09-06 Online:2019-02-10 Published:2019-02-15

基于重构的改进自然排序树算法

杜媛1, 张世伟2   

  1. 1. 中国计量大学 信息工程学院, 杭州 310018;
    2. 中国计量大学 现代教育技术中心, 杭州 310018
  • 通讯作者: 杜媛
  • 作者简介:杜媛(1991-),女,安微宿州人,硕士研究生,主要研究方向:数据挖掘、大数据分析;张世伟(1979-),男,浙江杭州人,工程师,主要研究方向:软件工程、数据库。

Abstract: In order to solve the problems such as too many nodes and low compressibility in the tree structure constructed by CANonical-order tree (CAN-tree) algorithm, an improved CAN-tree algorithm based on restructure was proposed. Firstly, a tree structure was constructed directly with canonical-order, which scans the database only once in the frequent itemset mining algorithm. Then, in order to get a tree structure with high compressibility, a pruning operation was used with support in desending order to restructure the tree. Finally, frequent itemsets were mined out for the reconstructed tree structure. The experimental results show that compared with original CAN-tree algorithm, the number of nodes constructed by the improved CAN-tree algorithm is reduced to less than 20%,and the execution efficiency is improved by 4 to 6 times. The proposed algorithm shortens the execution time of the frequent itemset mining algorithm and effectively compresses the tree structure in it.

Key words: frequent itemset, frequent itemset header table, restructure, pruning, minimum support

摘要: 针对自然排序树(CAN-tree)算法构建的树结构节点个数过多、压缩性不高等问题,提出一种基于重构的改进CAN-tree算法。首先,使用自然排序法直接构建树结构,将频繁项集挖掘算法实现中数据库扫描次数减少至1;然后,对构建的树结构以支持度降序方式结合剪枝操作实现树结构的重构,得到高压缩性的树结构;最后,对重构的树结构进行频繁项集挖掘。实验结果表明,基于重构的改进CAN-tree算法所构建的树结构节点个数减少至原来的20%以下,执行效率提高了4至6倍,在频繁项集挖掘中有效地压缩了树结构,缩短了算法的执行时间。

关键词: 频繁项集, 频繁项集头表, 重构, 剪枝, 最小支持度

CLC Number: