计算机应用 ›› 2010, Vol. 30 ›› Issue (2): 423-426.

• 计算机软件 • 上一篇    下一篇

最小闭树特征集的聚类与分类方法

郭鑫1,李云2,黄云2,周清平2   

  1. 1. 吉首大学
    2.
  • 收稿日期:2009-08-07 修回日期:2009-10-11 发布日期:2010-02-10 出版日期:2010-02-01
  • 通讯作者: 郭鑫

Novel tree cluster and classification approach based on least closed tree

  • Received:2009-08-07 Revised:2009-10-11 Online:2010-02-10 Published:2010-02-01
  • Contact: guo xin

摘要: 提出一种基于最小闭树特征集的聚类与分类方法,有效地解决了在实际应用中因数据量大而无法聚类与分类的问题。其基本思想为:以最小闭树特征集作为候选聚类与分类特征,采用动态阈值按相似度聚类,使得树聚类快速而精确;提出树分类规则等级概念,并应用于树分类方法中,能迅速预测未知的树结构。实验结果表明,在树节点数较多或数据量大时,新方法有效可行,且与类其他方法相比效率有显著提高。

关键词: 数据挖掘, 频繁子树, 闭树模式, 树聚类, 树分类

Abstract: A tree clustering and classification algorithm was proposed based on least closed tree, which effectively solved the problem that the clustering and classification can not be completed when data amount is very large in practical application. Least closed tree was regarded as the candidate cluster and classification features. The dynamic threshold was used for similarity cluster to make tree cluster operation rapid and accurate. Meanwhile the concept of tree classification rule grade was proposed and applied in tree classification algorithm, so that the unknown tree structure could be predicted promptly. Experimental results show that the method has higher speed and efficiency than that of other similar ones especially with large number of tree nodes.

Key words: data mining, frequent subtree, closed tree pattern, tree clustering, tree classification