计算机应用 ›› 2017, Vol. 37 ›› Issue (12): 3482-3486.DOI: 10.11772/j.issn.1001-9081.2017.12.3482

• 人工智能 • 上一篇    下一篇

快速识别密度骨架的聚类算法

邱保志, 唐雅敏   

  1. 郑州大学 信息工程学院, 郑州 450001
  • 收稿日期:2017-06-01 修回日期:2017-08-17 出版日期:2017-12-10 发布日期:2017-12-18
  • 通讯作者: 唐雅敏
  • 作者简介:邱保志(1964-),男,河南驻马店人,教授,博士,CCF会员,主要研究方向:数据挖掘、人工智能;唐雅敏(1991-),女,河南郑州人,硕士研究生,主要研究方向:数据挖掘。
  • 基金资助:
    河南省基础与前沿基金资助项目(152300410191)。

Efficient clustering algorithm for fast recognition of density backbone

QIU Baozhi, TANG Yamin   

  1. School of Information Engineering, Zhengzhou University, Zhengzhou Henan 450001, China
  • Received:2017-06-01 Revised:2017-08-17 Online:2017-12-10 Published:2017-12-18
  • Supported by:
    This work is partially supported by the Basic and Advanced Technology Research Project of Henan Province (152300410191).

摘要: 针对如何快速寻找密度骨架、提高高维数据聚类准确性的问题,提出一种快速识别高密度骨架的聚类(ECLUB)算法。首先,在定义了对象局部密度的基础上,根据互k近邻一致性及近邻点局部密度关系,快速识别出高密度骨架;然后,对未分配的低密度点依据邻近关系进行划分,得到最终聚类。人工合成数据集及真实数据集上的实验验证了所提算法的有效性,在Olivetti Face数据集上的聚类结果显示,ECLUB算法的调整兰德系数(ARI)和归一化互信息(NMI)分别为0.8779和0.9622。与经典的基于密度的聚类算法(DBSCAN)、密度中心聚类算法(CFDP)以及密度骨架聚类算法(CLUB)相比,所提ECLUB算法效率更高,且对于高维数据聚类准确率更高。

关键词: 聚类算法, 高维数据, k近邻, 密度骨架, 局部密度

Abstract: In order to find density backbone quickly and improve the accuracy of high-dimensional data clustering results, a new algorithm for fast recognition of high-density backbone was put forward, which was named Efficient CLUstering based on density Backbone (ECLUB) algorithm. Firstly, on the basis of defining the local density of object, the high-density backbone was identified quickly according to the mutual consistency of k-nearest neighbors and the local density relation of neighbor points. Then, the unassigned low-density points were divided according to the neighborhood relations to obtain the final clustering. The experimental results on synthetic datasets and real datasets show that the proposed algorithm is effective. The clustering results of Olivetti Face dataset show that, the Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI) of the proposed ECLUB algorithm is 0.8779 and 0.9622 respectively. Compared with the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, Clustering by Fast search and find of Density Peaks (CFDP) algorithm and CLUstering based on Backbone (CLUB) algorithm, the proposed ECLUB algorithm is more efficient and has higher clustering accuracy for high-dimensional data.

Key words: clustering algorithm, high-dimensional data, k-nearest neighbor, density backbone, local density

中图分类号: