计算机应用

• 人工智能与仿真 •    下一篇

面向变尺度密度数据的分级聚类算法

袁志琴,庄华亮,何熊熊   

  1. 浙江工业大学 信息工程学院
  • 收稿日期:2019-12-11 修回日期:2020-02-02 发布日期:2021-02-07 出版日期:2021-02-07
  • 通讯作者: 庄华亮

Hierarchical clustering algorithm for density-varying data

  • Received:2019-12-11 Revised:2020-02-02 Online:2021-02-07 Published:2021-02-07

摘要: 针对传统的基于距离和密度的聚类算法存的在一些常见的问题,诸如不适用于密度多尺度变化的数据及非凸状数据聚类、聚类质量过于依赖参数、计算复杂度较高等,提出了一种基于区域生长及竞争的分级聚类算法。聚类过程分为三级,首先,第一级聚类基于欧氏距离,用距离阈值将对象划分为一定数目的小类来覆盖数据空间,同时降低算法复杂度;然后,第二级用空间数据区域生长的方法,用已获得簇心作为生长种子,在逐步放宽类半径准则的方法下进行生长,来解决变尺度数据密度聚类的问题;最后,第三级基于竞争的思想与密度相似性原则,计算簇心之间的权重,采取适当的规则进行簇的合并,来解决非凸状数据聚类的问题。实验结果表明,所提算法相较K-means及DBSCAN算法能在克服变尺度密度数据空间问题的基础上最大地提高聚类的准确度并缩短聚类时间。

关键词: 分级聚类, 变尺度数据, 区域生长, 关系权重, 类合并

Abstract: The traditional distance-based and density-based clustering algorithms have problems that are not suitable for clustering density-varying data and non-convex data besides their high sensitivity to parameters and computational complexity. A hierarchical clustering algorithm based on region-growing and competition learning was proposed. The algorithm consists of three phases. In the first phase of the algorithm,the data space was covered by using multiple small clusters with a low-complexity computation.In the second phase,the method of spatial data region-growing was applied by taking the obtained cluster centers as the seed. The seed growing method gradually relaxes the criterion of cluster radius to solve the problem of clustering of density-varying data. In the third phase,the weights between cluster centers was calculated and a set of rules were adopted to merge clusters to solve the problem of non-convex data clustering based on the idea of competition and the principle of density similarity. The experimental results show that, the proposed method can significantly improve the clustering accuracy with competitive processing speed for density-varying data as compared with K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithms.

Key words: hierarchical clustering, density-varying data, area growth, link weight, class merging

中图分类号: