Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (4): 1122-1127.DOI: 10.11772/j.issn.1001-9081.2020071042

Special Issue: 数据科学与技术

• Data science and technology • Previous Articles     Next Articles

Improved wavelet clustering algorithm based on peak grid

LONG Chaoqi, JIANG Yu, XIE Yu   

  1. School of Software Engineering, Chengdu University of Information Technology, Chengdu Sichuan 610225, China
  • Received:2020-07-17 Revised:2020-09-21 Online:2021-04-10 Published:2020-10-20

基于峰值网格改进的小波聚类算法

龙超奇, 蒋瑜, 谢雨   

  1. 成都信息工程大学 软件工程学院, 成都 610225
  • 通讯作者: 蒋瑜
  • 作者简介:龙超奇(1996—),男,四川德阳人,硕士研究生,主要研究方向:数据挖掘、智能计算、小波聚类;蒋瑜(1980—),男,四川邻水人,副教授,硕士,主要研究方向:入侵检测、粗糙集、数据挖掘、智能计算;谢雨(1996—),男,四川内江人,硕士研究生,主要研究方向:数据挖掘、智能计算、异常点检测。

Abstract: Aiming at the difference between the clustering effects of wavelet clustering algorithm under different grid division scales, an improved method based on peak grid was proposed. The algorithm mainly aimed at improving the detection method of connected regions in wavelet clustering. First, the spatial grids after wavelet transform were sorted according to the grid values; then, the breadth-first-search method was used to traverse each spatial grid to detect the peak connected regions in the data after wavelet transform; finally, the connected regions were marked and mapped to the original data space to obtain the clustering result. Experimental results of 8 synthetic datasets(4 convex datasets and 4 non-convex datasets) and 2 real datasets in the UCI database showed that the improved algorithm had good performance at low grid division scales, and compared with the original wavelet clustering algorithm, this algorithm had the requirement for grid division scale reduced by 25% to 60%, and the clustering time reduced by 14% under the same clustering effect.

Key words: grid scale, peak grid, wavelet clustering, connected region, Breadth-First-Search (BFS)

摘要: 针对小波聚类算法在不同网格划分尺度下表现出的聚类效果差异,提出了一种基于峰值网格的改进方法。算法主要针对小波聚类中连通区域的检测方式进行改进:首先,将小波变换后的空间网格依网格值的大小进行排序;然后利用广度优先搜索的方式遍历每一个空间网格,以检测经小波变换后数据中的峰值连通区域;最后,标记连通区域并将其映射到原数据空间中,以得出聚类结果。在8个人工数据集(4个凸数据集与4个非凸数据集)和UCI数据库中的2个真实数据集上的实验结果表明,改进算法在低网格划分尺度下有着良好的表现,与原小波聚类算法相比,这个算法对网格划分尺度的需求降低了25%~60%,并且在相同的聚类效果下减少了14%的聚类所需时间。

关键词: 网格尺度, 峰值网格, 小波聚类, 连通区域, 广度优先搜索(BFS)

CLC Number: