Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (12): 3755-3763.DOI: 10.11772/j.issn.1001-9081.2023010094

• Data science and technology • Previous Articles     Next Articles

Agglomerative hierarchical clustering algorithm based on hesitant fuzzy set

Wenquan LI(), Yimin MAO, Xindong PENG   

  1. School of Information Engineering,Shaoguan University,Shaoguan Guangdong 512005,China
  • Received:2023-02-07 Revised:2023-05-05 Accepted:2023-05-08 Online:2023-06-06 Published:2023-12-10
  • Contact: Wenquan LI
  • About author:MAO Yimin, born in 1970, Ph. D., professor. Her research interests include data mining, big data security.
    PENG Xindong, born in 1990, Ph. D., associate professor. His research interests include fuzzy mathematics, artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(62006155);Scientific Research Project of Department of Education of Guangdong Province(2022ZDJS048);Characteristic Innovation Project in Ordinary Universities in Guangdong Province(2023KTSCX137)

基于犹豫模糊集的凝聚式层次聚类算法

李文全(), 毛伊敏, 彭新东   

  1. 韶关学院 信息工程学院,广东 韶关 512005
  • 通讯作者: 李文全
  • 作者简介:李文全(1980—),男,江西龙南人,副教授,硕士,主要研究方向:数据挖掘、模糊数学;Email:78192128@qq.com
    毛伊敏(1970—),女,新疆伊犁人,教授,博士,主要研究方向:数据挖掘、大数据安全
    彭新东(1990—),男,江西九江人,副教授,博士,主要研究方向:模糊数学、人工智能。
  • 基金资助:
    国家自然科学基金资助项目(62006155);广东省教育厅科研项目(2022ZDJS048);广东省普通高校特色创新类项目(2023KTSCX137)

Abstract:

Aiming at the problems of information distortion, poor objectivity of attribute weights, and high time complexity in hesitant fuzzy clustering analysis, an Agglomerative Hierarchical Clustering algorithm based on Hesitant Fuzzy set (AHCHF) was proposed. Firstly, the average value of hesitancy fuzzy elements was used to expand the data object with small hesitation. Secondly, the weights of data object before and after expansion were calculated by using the original information entropy and internal maximum difference, and the comprehensive attribute weight was determined according to the minimum discrimination information between the two weight vectors. Finally, with the goal of making the sum of weighted distances smaller, a center point construction method with constant hesitation was given. Experimental results on specific examples and synthetic datasets show that compared with the classic Hesitant Fuzzy Hierarchical Clustering algorithm (HFHC) and the recent Fuzzy Hierarchical Clustering Algorithm (FHCA), the proposed AHCHF increases the mean Silhouette Coefficient (SC) by 23.99% and 9.28% respectively, and shortens the running time by 27.18% and 6.40% averagely and respectively, proving that the proposed algorithm can effectively solve the problems of information distortion and poor objectivity of attribute weights, and improve the clustering effect and performance well.

Key words: hesitant fuzzy set, clustering analysis, hesitation, data mining, fuzzy entropy

摘要:

针对犹豫模糊聚类分析存在信息失真、属性权重客观性差、时间复杂度高的问题,提出一种基于犹豫模糊集的凝聚式层次聚类算法(AHCHF)。首先,采用犹豫模糊元的平均值扩充犹豫度小的数据对象;其次,利用原始信息熵和内部最大差异计算数据对象扩充前后的权重,并根据两个权重向量之间的最小鉴别信息确定属性的综合权重;最后,以加权距离和更小为目标,给出犹豫度恒定的中心点构造方法。在具体实例和人造数据集上进行的实验结果表明,相较于经典的犹豫模糊层次聚类算法(HFHC)和较新的模糊层次聚类算法(FHCA),AHCHF的轮廓系数(SC)均值分别提高了23.99%和9.28%,运行时间分别平均减少了27.18%和6.40%。以上结果验证了所提算法可以有效解决信息失真、属性权重客观性差的问题,并较好地提升聚类效果和聚类性能。

关键词: 犹豫模糊集, 聚类分析, 犹豫度, 数据挖掘, 模糊熵

CLC Number: