《计算机应用》唯一官方网站

所属专题: 数据科学与技术

• •    下一篇

基于信念子簇切割的模糊聚类算法

丁雨1,张瀚霖2,罗荣2,孟华3   

  1. 1. 西南交通大学数学学院
    2. 西南交通大学犀浦校区
    3. 西南交通大学
  • 收稿日期:2023-05-18 修回日期:2023-07-06 接受日期:2023-07-14 发布日期:2026-02-05 出版日期:2024-04-10
  • 通讯作者: 丁雨

Fuzzy clustering algorithm based on belief subcluster cutting

  • Received:2023-05-18 Revised:2023-07-06 Accepted:2023-07-14 Online:2026-02-05 Published:2024-04-10

摘要: 摘 要: 信念峰值聚类算法(BPC)是一种基于模糊视角的密度峰值聚类算法(DPC)的新变体,它用模糊数学的观点刻画数据的分布特征与相关性。但BPC算法在信念值的计算上主要基于局部数据点信息,而未考察整个数据集整体的分布和结构,并且原始的分配策略鲁棒性弱。针对以上问题,提出一种基于信念子簇切割的模糊聚类算法(BSCC) ,该算法结合了信念峰值和谱方法。首先,通过局部信念信息将数据集划分为众多高纯度子簇;然后,将子簇视作新样本,通过簇间的相似关系,利用谱方法进行割图聚类,从而耦合了局部信息与全局信息;最后,将子簇内的点分配到子簇所在类簇以完成最终聚类。与BPC算法相比,BSCC在带有多子簇结构的据集上具有明显优势,如在americanflag数据集和car数据集上的准确率分别提高了16.38个百分点和21.35个百分点 。一系列的在合成数据集和真实数据集上的聚类实验表明,BSCC在调整兰德系数(ARI)、归一化互信息(NMI)和准确率(ACC)这三个评价指标上整体优于BPC和其他7种聚类算法。

关键词: 聚类分析, 密度峰值聚类, 信念峰值聚类, 谱聚类, 信念子簇, 子簇合并

Abstract: Abstract: Belief Peaks Clustering (BPC) algorithm is a new variant of Density Peaks Clustering (DPC) algorithm based on fuzzy perspective. It uses fuzzy mathematics to describe the distribution characteristics and correlation of data. However, BPC algorithm mainly relies on the information of local data points in the calculation of belief values, instead of investigating the distribution and structure of the whole dataset. Moreover, the robustness of the original allocation strategy is weak. To solve these problems, a fuzzy Clustering algorithm based on Belief Subcluster Cutting (BSCC) was proposed by combining belief peak value and spectral method. Firstly, the dataset was divided into many high-purity subclusters by local belief information. Then, the subcluster was regarded as a new sample, and the spectral method was used for cutting graph clustering through the similarity relationship between clusters, thus coupling local information and global information. Finally, the points in the subcluster were assigned to the class cluster where the subcluster was located to complete the final clustering. Compared with BPC algorithm, BSCC has obvious advantages on datasets with multiple subclusters, such as the accuracy improvement of 16.38 and 21.35 percentage points on the americanflag dataset and car dataset, respectively. A series of clustering experiments on synthetic datasets and real datasets show that BSCC outperforms BPC and the other seven clustering algorithms on the three evaluation indicators of Adjusted Rand Index (ARI), Normalized Mutual Information (NMI) and Accuracy (ACC).

Key words: clustering analysis, density peaks clustering, belief peaks clustering, spectral clustering, belief subcluster, subcluster merging

中图分类号: