《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (5): 1464-1471.DOI: 10.11772/j.issn.1001-9081.2021050753

• 数据科学与技术 • 上一篇    下一篇

基于自适应近邻参数的密度峰聚类算法

周欢欢1, 郑伯川2(), 张征1, 张琦1   

  1. 1.西华师范大学 数学与信息学院,四川 南充 637009
    2.西华师范大学 计算机学院,四川 南充 637009
  • 收稿日期:2021-05-11 修回日期:2021-08-27 接受日期:2021-08-30 发布日期:2022-03-08 出版日期:2022-05-10
  • 通讯作者: 郑伯川
  • 作者简介:周欢欢(1996—),女,重庆人,硕士研究生,主要研究方向:机器学习、聚类分析
    郑伯川(1974—),男,四川自贡人,教授,博士,CCF会员,主要研究方向:机器学习、深度学习、计算机视觉 zhengbc@vip.163.com
    张征(1978—),女,四川自贡人,副教授,硕士,主要研究方向:运筹与优化
    张琦(1996—),女,重庆人,硕士研究生,主要研究方向:机器学习、聚类分析。
  • 基金资助:
    国家自然科学基金资助项目(62176217)

Density peak clustering algorithm based on adaptive nearest neighbor parameters

Huanhuan ZHOU1, Bochuan ZHENG2(), Zheng ZHANG1, Qi ZHANG1   

  1. 1.School of Mathematics and Information,China West Normal University,Nanchong Sichuan 637009,China
    2.School of Computer Science,China West Normal University,Nanchong Sichuan 637009,China
  • Received:2021-05-11 Revised:2021-08-27 Accepted:2021-08-30 Online:2022-03-08 Published:2022-05-10
  • Contact: Bochuan ZHENG
  • About author:ZHOU Huanhuan, born in 1996,M. S. candidate. Her researchinterests include machine learning,clustering analysis.
    ZHENG Bochuan, interests include machine learning,deep learning,computer vision.
    ZHANG Zheng, born in 1978,M. S.,associate professor. Herresearch interests include operations research and optimization.
    ZHANG Qi, born in 1996,M. S. candidate. Her research interestsinclude machine learning,clustering analysis.
  • Supported by:
    National Natural Science Foundation of China(62176217)

摘要:

针对基于共享最近邻的密度峰聚类算法中的近邻参数需要人为设定的问题,提出了一种基于自适应近邻参数的密度峰聚类算法。首先,利用所提出的近邻参数搜索算法自动获得近邻参数;然后,通过决策图选取聚类中心;最后,根据所提出的代表点分配策略,先分配代表点,后分配非代表点,从而实现所有样本点的聚类。将所提出的算法与基于共享最近邻的快速密度峰搜索聚类(SNN?DPC)、基于密度峰值的聚类(DPC)、近邻传播聚类(AP)、对点排序来确定聚类结构(OPTICS)、基于密度的噪声应用空间聚类(DBSCAN)和K-means这6种算法在合成数据集以及UCI数据集上进行聚类结果对比。实验结果表明,所提出的算法在调整互信息(AMI)、调整兰德系数(ARI)和FM指数(FMI)等评价指标上整体优于其他6种算法。所提算法能自动获得有效的近邻参数,且能较好地分配簇边缘区域的样本点。

关键词: 共享最近邻, 局部密度, 密度峰聚类, k-近邻, 逆近邻

Abstract:

Aiming at the problem that the nearest neighbor parameters need to be set manually in density peak clustering algorithm based on shared nearest neighbor, a density peak clustering algorithm based on adaptive nearest neighbor parameters was proposed. Firstly, the proposed nearest neighbor parameter search algorithm was used to automatically obtain the nearest neighbor parameters. Then, the clustering centers were selected through the decision diagram. Finally, according to the proposed allocation strategy of representative points, all sample points were clustered through allocating the representative points and the non-representative points sequentially. The clustering results of the proposed algorithm was compared with those of the six algorithms such as Shared-Nearest-Neighbor-based Clustering by fast search and find of Density Peaks (SNN?DPC), Clustering by fast search and find of Density Peaks (DPC), Affinity Propagation (AP), Ordering Points To Identify the Clustering Structure (OPTICS), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and K-means on the synthetic datasets and UCI datasets. Experimental results show that, the proposed algorithm is better than the other six algorithms on the evaluation indicators such as Adjusted Mutual Information (AMI), Adjusted Rand Index (ARI) and Fowlkes and Mallows Index (FMI). The proposed algorithm can automatically obtain the effective nearest neighbor parameters, and can better allocate the sample points in the edge region of the cluster.

Key words: shared nearest neighbor, local density, density peak clustering, k-neighbor, inverse neighbor

中图分类号: