Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (6): 1654-1661.DOI: 10.11772/j.issn.1001-9081.2019111881

• Data science and technology • Previous Articles     Next Articles

Adaptive density peaks clustering algorithm

WU Bin, LU Hongli, JIANG Huijun   

  1. Industrial Engineering Department, Nanjing Tech University, Nanjing Jiangsu 211816, China
  • Received:2019-11-05 Revised:2020-01-03 Online:2020-06-10 Published:2020-06-18
  • Contact: WU Bin,born in 1979,Ph. D.,associate professor. His research interests include system modeling and simulation
  • About author:LU Hongli,born in 1995,M. S. candidate. Her research interests include system modeling and simulation.WU Bin,born in 1979,Ph. D.,associate professor. His research interests include system modeling and simulation.JIANG Huijun,born in 1996,M. S. candidate. Her research interests include logistics supply chain management.
  • Supported by:
    National Natural Science Foundation of China (71671089).

自适应密度峰值聚类算法

吴斌, 卢红丽, 江惠君   

  1. 南京工业大学 工业工程系,南京 211816
  • 通讯作者: 吴斌(1979—)
  • 作者简介:吴斌(1979—),男,河南郑州人,副教授,博士,主要研究方向:系统建模与仿真.卢红丽(1995—),女,河南商丘人,硕士研究生,主要研究方向:系统建模与仿真.江惠君(1996—),女,江苏扬州人,硕士研究生,主要研究方向:物流供应链管理.
  • 基金资助:
    国家自然科学基金资助项目(71671089)。

Abstract: Density Peaks Clustering (DPC) algorithm is a new clustering algorithm with the advantages such as few adjustment parameters, no iterative solution and the capacity of finding non-spherical clusters. However, there are some disadvantages of the algorithm: the cutoff distance cannot be adjusted automatically, and the cluster centers need to be selected manually. For the above problems, an Adaptive DPC (ADPC) algorithm was proposed, the adjustment of adaptive cutoff distance based on Gini coefficient was realized, and an automatic acquisition strategy of clustering centers was established. Firstly, the calculation formula of cluster center weight was redefined by taking local density and relative distance into account at the same time. Then, the adjustment method of adaptive cutoff distance was established based on Gini coefficient. Finally, according to the decision graph and cluster center weight sort graph, the strategy of automatically selecting cluster centers was proposed. The simulation results show that, the ADPC algorithm can automatically adjust the cutoff distance and automatically acquire the clustering centers according to the characteristics of problem, and obtain better results than several commonly clustering algorithms and improved DPC algorithms on the test datasets.

Key words: density peak, cutoff distance, automatic clustering, Gini coefficient, clustering center

摘要: 密度峰值聚类(DPC)算法是一种新型的聚类算法,具有调节参数少、无需迭代求解、能够发现非球形簇等优点;但也存在截断距离无法自动调节、聚类中心需要人工指定等缺点。针对上述问题,提出了一种自适应DPC(ADPC)算法,实现了基于基尼系数的自适应截断距离调节,并建立了一种聚类中心的自动获取策略。首先,综合考虑局部密度和相对距离两种因素以重新定义簇中心权值计算公式;然后,基于基尼系数建立自适应截断距离调节方法;最后,根据决策图和簇中心权值排序图提出自动选取聚类中心的策略。仿真实验结果表明,ADPC算法可以根据问题特征来自动调节截断距离并自动获取聚类中心点,而且在测试数据集上取得了比几种常用的聚类算法和DPC改进算法更好的结果。

关键词: 密度峰值, 截断距离, 自动聚类, 基尼系数, 聚类中心

CLC Number: