Frequent subtree mining method based on coverage patterns

doi:10.11772/j.issn.1001-9081.2017.09.2439

Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (9): 2439-2442.DOI: 10.11772/j.issn.1001-9081.2017.09.2439

Previous Articles Next Articles

Frequent subtree mining method based on coverage patterns

XIA Ying, LI Hongxu

School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Received:2017-03-27 Revised:2017-04-25 Online:2017-09-10 Published:2017-09-13
Supported by:
This work is partially supported by National Natural Science Foundation of China (41201378)

基于覆盖模式的频繁子树挖掘方法

夏英, 李洪旭

重庆邮电大学计算机科学与技术学院, 重庆 400065

通讯作者: 李洪旭,565268915@qq.com
作者简介:夏英(1972-),女,四川南充人,教授,博士生导师,博士,主要研究方向:数据库与数据挖掘、云计算与大数据、空间信息处理;李洪旭(1990-),男,四川资阳人,硕士研究生,主要研究方向:数据挖掘与大数据。
基金资助:
国家自然科学基金资助项目（41201378）。

Abstract

Abstract: Unordered tree is widely used for semi-structured data modeling, frequent subtrees mining on it has benefit for finding hidden knowledge. The traditional methods of mining frequent subtrees often output large-scale frequent subtrees with redundant information, such an output will reduce the efficiency of subsequent operations. In view of the shortcomings of traditional methods, the Mining CoveRage Pattern (MCRP) algorithm was proposed for mining coverage patterns. Firstly, a tree coding rule according to the tree width and the number of children was presented. Then, all candidate subtrees were generated by edge extension based on the maximum prefix coding sequence. Finally, a set of coverage patterns was output on the basis of frequent subtrees and δ'-coverage concept. Compared with the traditional algorithms for mining frequent closed tree patterns and maximal frequent tree patterns, the proposed algorithm can output fewer frequent subtrees in the case of preserving all the frequent subtrees, and the processing efficiency is increased by 15% to 25%.The experimental results show that the algorithm can effectively reduce the size and redundant information of the output frequent subtrees, and it has high feasibility in practical operation.

Key words: unordered tree, frequent subtree, maximum prefix coding, edge extension, coverage pattern

摘要： 无序树常用于半结构化数据建模，对其进行频繁子树挖掘有利于发现隐藏的知识。传统的频繁子树挖掘方法常常输出大规模且带有冗余信息的频繁子树，这样的输出结果会降低后续操作的效率。针对传统方法的不足，提出了一种用于挖掘覆盖模式（MCRP）算法。首先，采用宽度孩子数编码对树进行编码；然后，通过基于最大前缀编码序列的边扩展方式生成所有的候选子树；最后，在频繁子树集和δ'-覆盖概念的基础上输出覆盖模式集。与传统的挖掘频繁闭树模式和极大频繁树模式的算法相比，该算法能够在保留所有频繁子树信息的情况下输出更少的频繁子树，并且将处理效率提高15%到25%。实验结果表明，所提算法能有效减小输出频繁子树的规模，减少冗余信息，在实际操作中具有较高的可行性。

关键词: 无序树, 频繁子树, 最大前缀编码, 边扩展, 覆盖模式

CLC Number:

TP391.4

XIA Ying, LI Hongxu. Frequent subtree mining method based on coverage patterns[J]. Journal of Computer Applications, 2017, 37(9): 2439-2442.

夏英, 李洪旭. 基于覆盖模式的频繁子树挖掘方法[J]. 计算机应用, 2017, 37(9): 2439-2442.

References

[1] YAKOP M A M, MUTALIB S, ABDUL-RAHMAN S. Data projection effects in frequent itemsets mining[M]//Soft Computing in Data Science. Berlin:Springer, 2015:23-32.
[2] MALVIYA J, SINGH A, SINGH D. An FP tree based approach for extracting frequent pattern from large database by applying parallel and partition projection[J]. International Journal of Computer Applications, 2015, 114(18):1-5.
[3] DEEPAK A, FERNÁNDEZ-BACA D, TIRTHAPURA S, et al. EvoMiner:frequent subtree mining in phylogenetic databases[J]. Knowledge and Information Systems, 2014, 41(3):559-590.
[4] 吴倩,罗健旭.压缩FP-Tree的改进搜索算法[J]. 计算机工程与设计,2015,36(7):1771-1777.(WU Q, LUO J X. Improved search algorithm of compressed FP-Tree[J]. Computer Engineering and Design, 2015, 36(7):1771-1777.)
[5] ZHANG S, DU Z, WANG J T. New techniques for mining frequent patterns in unordered trees[J]. IEEE Transactions on Cybernetics, 2015, 45(6):1113-1125.
[6] FENG B, XU Y, ZHAO N, et al. A new method of mining frequent closed trees in data streams[C]//Proceedings of the 20107th International Conference on Fuzzy Systems and Knowledge Discovery. Piscataway, NJ:IEEE, 2010:2245-2249.
[7] 杨沛,谭琦.极大频繁子树挖掘及其应用[J].计算机科学,2008,35(2):150-153.(YAN P, TAN Q. Maximal frequent subtree mining and its application[J]. Computer Science, 2008, 35(2):150-153.)
[8] XIN D, HAN J, YAN X, et al. Mining compressed frequent-pattern sets[C]//Proceedings of the 31st International Conference on Very Large Data Bases. Trondheim, Norway:VLDB Endowment, 2005:709-720.
[9] LIU L, LIU J. Mining frequent embedded subtree from tree-like databases[C]//Proceedings of the 2011 International Conference on Internet Computing and Information Services. Washington, DC:IEEE Computer Society, 2011:3-7.
[10] JAIN A K, DUBES R C. Algorithms for Clustering Data[M]. Upper Saddle River, NJ:Prentice-Hall, Inc., 1988:67-73.
[11] HAN K, LV W, YIN B, et al. Constrained frequent subtree mining method[C]//Proceedings of the 20145th International Conference on Digital Home. Washington, DC:IEEE Computer Society, 2014:287-292.

Frequent subtree mining method based on coverage patterns

基于覆盖模式的频繁子树挖掘方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 2

Recommended Articles

Metrics

[1]	. Novel tree cluster and classification approach based on least closed tree [J]. Journal of Computer Applications, 2010, 30(2): 423-426.
[2]	. Novel incremental updating algorithm for frequent subtrees [J]. Journal of Computer Applications, 2010, 30(05): 1300-1303.