混合的密度峰值聚类算法

doi:10.11772/j.issn.1001-9081.2018061373

计算机应用 ›› 2019, Vol. 39 ›› Issue (2): 403-408.DOI: 10.11772/j.issn.1001-9081.2018061373

混合的密度峰值聚类算法

王军^1,2, 周凯¹, 程勇²

1. 南京信息工程大学计算机与软件学院, 南京 210044;
2. 南京信息工程大学科技产业处, 南京 210044

收稿日期:2018-07-02 修回日期:2018-08-24 发布日期:2019-02-15 出版日期:2019-02-10
通讯作者: 周凯
作者简介:王军(1970-),男,安徽铜陵人,教授,博士,CCF会员,主要研究方向:无线传感器网络、大数据;周凯(1993-),男,江苏连云港人,硕士研究生,主要研究方向:大数据;程勇(1980-),男,重庆人,高级工程师,博士,CCF会员,主要研究方向:无线传感器网络、大数据。
基金资助:
国家自然科学基金资助项目（41875184，61373064）；江苏省"六大人才高峰"创新团队项目（TD-XYDXX-004）；赛尔网络下一代互联网技术创新项目（NGII20170610，NGII20171204）；江苏省农业气象重点实验室开放基金资助项目（KYQ1309）。

Mixed density peaks clustering algorithm

WANG Jun^1,2, ZHOU Kai¹, CHENG Yong²

1. School of Computer & Software, Nanjing University of Information Science & Technology, Jiangsu Nanjing 210044, China;
2. Technology Industry Department, Nanjing University of Information Science & Technology, Jiangsu Nanjing 210044, China

Received:2018-07-02 Revised:2018-08-24 Online:2019-02-15 Published:2019-02-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (41875184,61373064), the Six Talent Peaks Innovation Team Project in Jiangsu Province (TD-XYDXX-004), the CERNET Networks Next-Generation Internet Technology Innovation Project (NGII20170610,NGII20171204), the Jiangsu Provincial Key Laboratory of Agricultural Meteorology Fund (KYQ1309).

摘要/Abstract

摘要： 密度峰值聚类（DP）算法是一种新的基于密度的聚类算法，当它处理的单个聚类包含多个密度峰值时，会将每个不同密度峰值视为潜在聚类中心，以致难以在数据集中确定正确数量聚类，为此，提出一种混合的密度峰值聚类算法C-DP。首先，以密度峰值点为初始聚类中心将数据集划分为子簇；然后，借鉴代表点层次聚类算法（CURE），从子簇中选取分散的代表点，将拥有最小距离的代表点对的类进行合并，引入参数收缩因子以控制类的形状。仿真实验结果表明，在4个合成数据集上C-DP算法比DP算法聚类效果更好；在真实数据集上的Rand Index指标对比表明，在数据集S1上，C-DP算法比DP算法性能提高了2.32%，在数据集4k2_far上，C-DP算法比DP算法性能提高了1.13%。由此可见，C-DP算法在单个类簇中包含多密度峰值的数据集中能提高聚类的准确性。

关键词: 密度峰值, 层次聚类, 类合并, 代表点, 收缩因子

Abstract: As a new density-based clustering algorithm, clustering by fast search and find of Density Peaks (DP) algorithm regards each density peak as a potential clustering center when dealing with a single cluster with multiple density peaks, therefore it is difficult to determine the correct number of clusters in the data set. To solve this problem, a mixed density peak clustering algorithm namely C-DP was proposed. Firstly, the density peak points were considered as the initial clustering centers and the dataset was divided into sub-clusters. Then, learned from the Clustering Using Representatives algorithm (CURE), the scattered representative points were selected from the sub-clusters, the clusters of the representative point pairs with the smallest distance were merged, and a parameter contraction factor was introduced to control the shape of the clusters. The experimental results show that the C-DP algorithm has better clustering effect than the DP algorithm on four synthetic datasets. The comparison of the Rand Index indicator on real datasets shows that on the dataset S1 and 4k2_far, the performance of C-DP is 2.32% and 1.13% higher than that of the DP. It can be seen that the C-DP algorithm improves the accuracy of clustering when datasets contain multiple density peaks in a single cluster.

Key words: density peak, hierarchical clustering, class merging, representative point, contraction factor

中图分类号:

TP301.6

王军, 周凯, 程勇. 混合的密度峰值聚类算法[J]. 计算机应用, 2019, 39(2): 403-408.

WANG Jun, ZHOU Kai, CHENG Yong. Mixed density peaks clustering algorithm[J]. Journal of Computer Applications, 2019, 39(2): 403-408.

参考文献

[1] CASSISI C, FERRO A, GIUGNO R, et al. Enhancing density-based clustering:Parameter reduction and outlier detection[J]. Information Systems, 2013, 38(3):317-330.
[2] LI K, ZHU C Y, LYU Q, et al. Personalized multi-modality image management and search for mobile devices[J]. Personal and Ubiquitous Computing, 2013, 17(8):1817-1834.
[3] AHN C-S, OH S-Y. Robust vocabulary recognition clustering model using an average estimator least mean square filter in noisy environments[J]. Personal and Ubiquitous Computing, 2014, 18(6):1295-1301.
[4] NICOVICH P R, TABARIN T, STIEGLER J, et al. Analysis of nanoscale protein clustering with quantitative localization microscopy[J]. Biophysical Journal, 2015, 108(2):475a.
[5] YEGANOVA L, KIM W, KIM S, et al. Retro:concept-based clustering of biomedical topical sets[J]. Bioinformatics, 2014, 30(22):3240-3248.
[6] CHANG M-S, CHEN L-H, HUNG L-J, et al. Exact algorithms for problems related to the densest k-set problem[J]. Information Processing Letters, 2014, 114(9):510-513.
[7] JAIN A K, MURTY M N, FLYNN P J. Data clustering:a review[J]. ACM Computing Surveys,1999, 31(3):264-323.
[8] 周涛,陆惠玲.数据挖掘中聚类算法研究进展[J].计算机工程与应用,2012,48(12):100-111. (ZHOU T, LU H L. Research progress of clustering algorithms in data mining[J]. Computer Engineering and Applications, 2012, 48(12):100-111.)
[9] GENG Y-A, LI Q, ZHENG R, et al. RECOME:a new density-based clustering algorithm using relative KNN kernel density[J]. Information Sciences, 2016, 436/437:13-30.
[10] 张丽杰.具有稳定饱和度的DBSCAN算法[J].计算机应用研究,2014,31(7):1972-1975. (ZHANG L J. Stable saturation density of DBSCAN algorithm[J]. Applications Research of Computers, 2014, 31(7):1972-1975.)
[11] ZHOU A Y, ZHOU S G, CAO J, et al. Approaches for scaling DBSCAN algorithm to large spatial databases[J]. Journal of Computer Science and Technology, 2000, 15(6):509-526.
[12] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[13] XU J, WANG G, DENG W. DenPEHC:density peak based efficient hierarchical clustering[J]. Information Sciences, 2016, 373:200-218.
[14] ZHANG W, LI J. Extended fast search clustering algorithm:widely density clusters, no density peaks[EB/OL].[2018-04-06]. https://arxiv.org/ftp/arxiv/papers/1505/1505.05610.pdf.
[15] XU X, DING S, XU H, et al. A feasible density peaks clustering algorithm with a merging strategy[J/OL]. Soft Computing, 2018[2018-03-06]. https://doi.org/10.1007/s00500-018-3183-0.
[16] MEHMOOD R, BIE R, DAWOOD H, et al. Fuzzy clustering by fast search and find of density peaks[C]//Proceedings of the 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things. Washington, DC:IEEE Computer Society, 2015:258-261.
[17] DU M, DING S. L-DP:A hybrid density peaks clustering method[C]//Proceedings of the 2017 International Conference on Data Mining and Big Data, LNCS 10387. Cham:Springer, 2017:74-80.
[18] 沈洁,赵雷,杨季文,等.一种基于划分的层次聚类算法[J].计算机工程与应用,2007,43(31):175-177. (SHEN J, ZHAO L, YANG J W, et al. Hierarchical clustering algorithm based on partition[J]. Computer Engineering and Applications, 2007,43(31):175-177.)
[19] WIWIE C, BAUMBACH J, RÖTTGER R. Comparing the performance of biomedical clustering methods[J]. Nature Methods, 2015, 12:1033-1038.

混合的密度峰值聚类算法

Mixed density peaks clustering algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	丁雨, 张瀚霖, 罗荣, 孟华. 基于信念子簇切割的模糊聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1128-1138.
[2]	章曼, 张正军, 冯俊淇, 严涛. 基于自适应可达距离的密度峰值聚类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1914-1921.
[3]	刘学文, 王继奎, 杨正国, 李强, 易纪海, 李冰, 聂飞平. 密度峰值优化的球簇划分欠采样不平衡数据分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1455-1463.
[4]	杜洁, 马燕, 黄慧. 基于局部引力和距离的聚类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1472-1479.
[5]	郭佳, 韩李涛, 孙宪龙, 周丽娟. 自动确定聚类中心的比较密度峰值聚类算法[J]. 计算机应用, 2021, 41(3): 738-744.
[6]	吕佳, 鲜焱. 结合改进密度峰值聚类和共享子空间的协同训练算法[J]. 计算机应用, 2021, 41(3): 686-693.
[7]	吴斌, 卢红丽, 江惠君. 自适应密度峰值聚类算法[J]. 计算机应用, 2020, 40(6): 1654-1661.
[8]	苏俊宁, 叶东毅. 基于样本密度峰值的不平衡数据欠抽样方法[J]. 计算机应用, 2020, 40(1): 83-89.
[9]	龚彦鹭, 吕佳. 结合主动学习和密度峰值聚类的协同训练算法[J]. 计算机应用, 2019, 39(8): 2297-2301.
[10]	樊仲欣, 王兴, 苗春生. 基于连通距离和连通强度的BIRCH改进算法[J]. 计算机应用, 2019, 39(4): 1027-1031.
[11]	王治和, 黄梦莹, 杜辉, 秦红武. 基于密度峰值与密度聚类的集成算法[J]. 计算机应用, 2019, 39(2): 398-402.
[12]	韩忠华, 毕开元, 司雯, 吕哲. 基于谱分析的密度峰值快速聚类算法[J]. 计算机应用, 2019, 39(2): 409-413.
[13]	杜航原, 裴希亚, 王文剑. 面向属性网络的重叠社区发现算法[J]. 计算机应用, 2019, 39(11): 3151-3157.
[14]	邱保志, 程栾. 基于拉普拉斯中心性和密度峰值的无参数聚类算法[J]. 计算机应用, 2018, 38(9): 2511-2514.
[15]	颜宏文, 盛成功. 基于层次聚类和极限学习机的母线短期负荷预测[J]. 计算机应用, 2018, 38(8): 2437-2441.