• •    

CCML2017+论文编号267+一种基于密度峰值的网格聚类算法

杨洁,王国胤,王飞   

  1. 重庆邮电大学
  • 收稿日期:2017-06-14 发布日期:2017-06-14
  • 通讯作者: 杨洁

A Grid Clustering Algorithm Based on Density Peaks

  • Received:2017-06-14 Online:2017-06-14
  • Contact: Yang Jie

摘要: 摘 要: 2014年提出的密度峰值聚类算法,思想简洁新颖,所需参数少,不需要进行迭代求解,而且具有可扩展性。本文基于密度峰值聚类算法提出了一种网格聚类算法,能够高效的对大规模数据进行处理。首先,本文将N维空间粒化为不相交的长方形网格单元,然后统计单元空间的信息。利用密度峰值聚类寻找中心点的思想确定中心单元,即中心网格单元被一些低局部密度的数据单元包围,而且与比自身局部密度高的网格单元的距离相对较大。最后合并与中心网格单元相近网格单元,从而得出聚类结果。在UCI人工数据集上的仿真实验结果表明,本文算法能够较快的得出聚类中心,有效的处理大规模数据的聚类问题,具有较高的效率,与原始的密度峰值聚类算法相比,在不同数据集上时间损耗降低10~100倍,而精度损失维持在5% ~ 8%。

关键词: 关键词: 密度峰值, 网格粒化, 大规模数据, 聚类

Abstract: Abstract: The density peak clustering algorithm (DPC) proposed in 2014, possess a concise and novel idea, which requires a fewer parameters and no iteration. In this paper, a grid clustering algorithm is proposed based on DPC. First, the N dimensional space is divided into disjoint rectangular unit. Then the central cells of space is found based on the idea of DPC, namely, the central cells are surrounded by other grid cells of low local density, and the distance with grid cells of high local density is relatively large. Finally, the grid cells similar to their central cells are merged to obtain the clustering results. The experiments show that this algorithm can quickly find the clustering center, and effectively deal with the clustering problem of large scale data, which processes a higher efficiency compared with the original density clustering algorithm on different data sets, reducing the loss of time 10~100 times, and maintaining the loss of accuracy at 5% ~ 8%.

Key words: Keywords: Density peak, grid granulation, large scale data, clustering

中图分类号: