计算机应用 ›› 2019, Vol. 39 ›› Issue (2): 398-402.DOI: 10.11772/j.issn.1001-9081.2018061411

• 数据科学与技术 • 上一篇    下一篇

基于密度峰值与密度聚类的集成算法

王治和, 黄梦莹, 杜辉, 秦红武   

  1. 西北师范大学 计算机科学与工程学院 兰州 730070
  • 收稿日期:2018-07-10 修回日期:2018-09-06 出版日期:2019-02-10 发布日期:2019-02-15
  • 通讯作者: 黄梦莹
  • 作者简介:王治和(1965-),男,甘肃武威人,教授,硕士,主要研究方向:数据挖掘;黄梦莹(1990-),女,河南许昌人,硕士研究生,CCF会员,主要研究方向:数据挖掘、机器学习;杜辉(1976-),女,甘肃兰州人,副教授,博士,主要研究方向:数据挖掘、智能计算;秦红武(1978-),男,甘肃兰州人,副教授,博士,主要研究方向:大数据、社会计算。
  • 基金资助:
    国家自然科学基金资助项目(61662068)。

Integrated algorithm based on density peaks and density-based clustering

WANG Zhihe, HUANG Mengying, DU Hui, QIN Hongwu   

  1. College of Computer Science and Engineering, Northwest Normal University, Lanzhou Gansu 730070, China
  • Received:2018-07-10 Revised:2018-09-06 Online:2019-02-10 Published:2019-02-15
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61662068).

摘要: 针对快速搜索和发现密度峰值聚类(CFSFDP)算法需人工在决策图上选择聚类中心的问题,提出一种基于密度峰值和密度聚类的集成算法。首先,借鉴CFSFDP思想,将局部密度最大的数据作为第一个中心;接着,从该中心点出发采用一种利用Warshall算法求解密度相连改进的基于密度的噪声应用空间聚类(DBSCAN)算法进行聚类,得到第一个簇;最后,在尚未被划分的数据中找出最大局部密度的数据,将它作为下一个簇的中心后再次采用上述算法进行聚类,直到所有数据被聚类或有部分数据被视为噪声。所提算法既解决了CFSFDP选择中心需人工干预的问题,又优化了DBSCAN算法,即每次迭代都是从当前最好的点(局部密度最大的点)出发寻找簇。通过可视化数据集和非可视化数据集与经典算法(CFSFDP、DBSCAN、模糊C均值(FCM)算法和K均值(K-means)算法)的对比实验结果表明,所提算法聚类效果更好,准确率更高,优于对比算法。

关键词: 密度峰值, 密度聚类, Warshall算法, 决策图, 聚类中心

Abstract: In order to solve the problem that Clustering by Fast Search and Find of Density Peaks (CFSFDP) needs to manually select the center on the decision graph, an Integrated Algorithm Based on Density Peaks and Density-based Clustering (IABDPDC) was proposed. Firstly, learning from the principle of CFSFDP, the data with the largest local density was selected as the first center. Then, from the first center, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm improved by Warshall algorithm was used to cluster to obtain the first category. Finally, from the data that has not been clustered, the maximum local density data was found out as the center of the next category and was clustered again by the above algorithm, until all the data was clustered or some data was considered as noise. The proposed algorithm not only solves the problem of manual center selection in CFSFDP, but also optimizes the DBSCAN algorithm, in which, every iteration starts from the current best point (the point with the largest local density). By comparing with the classical algorithms (such as CFSFDP, DBSCAN, fuzzy C-means (FCM) and K-means) on visual datasets and non-visualized datasets, the experimental results show that the proposed algorithm has better clustering effect with higher accuracy.

Key words: density peak, density-based clustering, Warshall algorithm, decision graph, cluster center

中图分类号: