《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (4): 1128-1138.DOI: 10.11772/j.issn.1001-9081.2023050610

• 数据科学与技术 • 上一篇    

基于信念子簇切割的模糊聚类算法

丁雨, 张瀚霖, 罗荣(), 孟华   

  1. 西南交通大学 数学学院,成都 611756
  • 收稿日期:2023-05-22 修回日期:2023-07-06 接受日期:2023-07-14 发布日期:2023-08-01 出版日期:2024-04-10
  • 通讯作者: 罗荣
  • 作者简介:丁雨(1999—),女,四川成都人,硕士研究生,主要研究方向:机器学习、聚类分析
    张瀚霖(1998—),男,四川武胜人,硕士研究生,主要研究方向:机器学习、数据的特征提取与降维、聚类分析
    罗荣(1980—),男,四川巴中人,副教授,博士,主要研究方向:代数编码、数据挖掘 luorong@swjtu.edu.cn
    孟华(1982—),男,河北邢台人,副教授,博士,CCF会员,主要研究方向:深度学习的可解释性、拓扑数据分析、知识表示与推理。
  • 基金资助:
    中央高校基本科研业务费专项资金资助项目(2682023ZTPY027)

Fuzzy clustering algorithm based on belief subcluster cutting

Yu DING, Hanlin ZHANG, Rong LUO(), Hua MENG   

  1. School of Mathematics,Southwest Jiaotong University,Chengdu Sichuan 611756,China
  • Received:2023-05-22 Revised:2023-07-06 Accepted:2023-07-14 Online:2023-08-01 Published:2024-04-10
  • Contact: Rong LUO
  • About author:DING Yu, born in 1999, M. S. candidate. Her research interests include machine learning, clustering analysis.
    ZHANG Hanlin, born in 1998, M. S. candidate. His research interests include machine learning, feature extraction and dimensionality reduction of data, clustering analysis.
    LUO Rong, born in 1980, Ph. D., associate professor. His research interests include algebraic coding, data mining.
    MENG Hua, born in 1982, Ph. D., associate professor. His research interests include interpretability in deep learning, topological data analysis, knowledge representation and reasoning.
  • Supported by:
    Fundamental Research Funds for Central Universities(2682023ZTPY027)

摘要:

信念峰值聚类(BPC)算法是一种基于模糊视角的密度峰值聚类(DPC)算法的新变体,它用模糊数学的观点刻画数据的分布特征与相关性。但BPC算法的信念值计算主要基于局部数据点信息,未考察数据集整体的分布和结构,且原始的分配策略鲁棒性弱。针对以上问题,提出一种基于信念子簇切割的模糊聚类算法(BSCC),所提算法结合了信念峰值和谱方法。首先,通过局部信念信息将数据集划分为众多高纯度子簇;其次,将子簇视作新样本,通过簇间的相似关系,利用谱方法进行割图聚类,从而耦合局部信息与全局信息;最后,将子簇内的点分配至子簇所在类簇以完成最终聚类。与BPC算法相比,BSCC在带有多子簇结构的数据集上具有明显优势,如在americanflag数据集和Car数据集上的准确率(ACC)分别提高了16.38个百分点和21.35个百分点。在合成数据集和真实数据集上的聚类实验结果表明,BSCC在调整兰德系数(ARI)、归一化互信息(NMI)和ACC这3个评价指标上整体优于BPC和其他7种聚类算法。

关键词: 聚类分析, 密度峰值聚类, 信念峰值聚类, 谱聚类, 信念子簇, 子簇合并

Abstract:

Belief Peaks Clustering (BPC) algorithm is a new variant of Density Peaks Clustering (DPC) algorithm based on fuzzy perspective. It uses fuzzy mathematics to describe the distribution characteristics and correlation of data. However, BPC algorithm mainly relies on the information of local data points in the calculation of belief values, instead of investigating the distribution and structure of the whole dataset. Moreover, the robustness of the original allocation strategy is weak. To solve these problems, a fuzzy Clustering algorithm based on Belief Subcluster Cutting (BSCC) was proposed by combining belief peaks and spectral method. Firstly, the dataset was divided into many high-purity subclusters by local belief information. Then, the subcluster was regarded as a new sample, and the spectral method was used for cutting graph clustering through the similarity relationship between clusters, thus coupling local information and global information. Finally, the points in the subcluster were assigned to the class cluster where the subcluster was located to complete the final clustering. Compared with BPC algorithm, BSCC has obvious advantages on datasets with multiple subclusters, and it has the ACCuracy (ACC) improvement of 16.38 and 21.35 percentage points on americanflag dataset and Car dataset, respectively. Clustering experimental results on synthetic datasets and real datasets show that BSCC outperforms BPC and the other seven clustering algorithms on the three evaluation indicators of Adjusted Rand Index (ARI), Normalized Mutual Information (NMI) and ACC.

Key words: clustering analysis, density peaks clustering, belief peaks clustering, spectral clustering, belief subcluster, subcluster merging

中图分类号: