计算机应用 ›› 2021, Vol. 41 ›› Issue (5): 1337-1342.DOI: 10.11772/j.issn.1001-9081.2020071130

所属专题: 数据科学与技术

• 数据科学与技术 • 上一篇    下一篇

基于万有引力的自适应近邻传播聚类算法

王治和, 常筱卿, 杜辉   

  1. 西北师范大学 计算机科学与工程学院, 兰州 730070
  • 收稿日期:2020-07-31 修回日期:2020-09-25 出版日期:2021-05-10 发布日期:2020-10-19
  • 通讯作者: 常筱卿
  • 作者简介:王治和(1965-),男,甘肃武威人,教授,博士,主要研究方向:数据挖掘;常筱卿(1992-),女,甘肃白银人,硕士研究生,主要研究方向:数据挖掘、智能计算;杜辉(1976-),女,甘肃兰州人,副教授,博士,主要研究方向:数据挖掘、智能计算。
  • 基金资助:
    国家自然科学基金资助项目(61962054)。

Adaptive affinity propagation clustering algorithm based on universal gravitation

WANG Zhihe, CHANG Xiaoqing, DU Hui   

  1. College of Computer Science and Engineering, Northwest Normal University, Lanzhou Gansu 730070, China
  • Received:2020-07-31 Revised:2020-09-25 Online:2021-05-10 Published:2020-10-19
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61962054).

摘要: 针对近邻传播(AP)聚类算法对参数偏向参数(Preference)敏感、不适用于稀疏数据、聚类结果中会出现错误聚类的样本点的问题,提出基于万有引力的自适应近邻传播聚类(GA-AP)算法。首先,在传统AP算法的基础上采用引力搜索机制对样本进行全局寻优;其次,在全局寻优的基础上利用信息熵和自适应增强(AdaBoost)算法找到每个簇内正确聚类和错误聚类的样本点,并计算出这些样本点的权值,用计算出的权值更新对应的样本点,从而更新相似度、Preference取值、吸引度和隶属度,并进行重新聚类。不断操作以上步骤直到达到最大的迭代次数。通过在9个数据集上的仿真实验得出,相比于基于自适应属性加权的近邻传播聚类(AFW_AP)算法、AP算法、K均值聚类(K-means)算法和模糊C均值(FCM)算法,所提算法的纯度(Purity)、F值(F-measure)和准确率(ACC)的平均值分别最高提升了0.69、71.74%和98.5%。实验结果表明,所提算法降低了对偏向参数的依赖,提高了聚类效果,特别是对于稀疏数据集的聚类结果的准确率。

关键词: 近邻传播聚类, 偏向参数, 万有引力定律, 信息熵, 自适应增强算法

Abstract: Focused on the problem that Affinity Propagation (AP) clustering algorithm is sensitive to parameter Preference, which is not suitable for sparse data, and has the incorrectly clustered sample points in the clustering results, an algorithm named Adaptive Affinity Propagation clustering based on universal gravitation (GA-AP) was proposed. Firstly, the gravitational search mechanism was introduced into the traditional AP algorithm in order to perform the global optimization to the sample points. Secondly, on the basis of global optimization, the correctly clustered and incorrectly clustered sample points in each cluster were found through the information entropy and Adaptive Boosting (AdaBoost) algorithm, the weights of the sample points were calculated. Each sample point was updated by the corresponding weight, so that the similarity, Preference value, attractiveness and membership degree were updated, and the re-clustering was performed. The above steps were continuously operated until the maximum number of iterations was reached. Through simulation experiments on nine datasets, it can be seen that compared to Affinity Propagation clustering based on Adaptive Attribute Weighting (AFW_AP) algorithm, AP algorithm, K-means clustering (K-means) algorithm and Fuzzy C-Means (FCM) algorithm, the proposed algorithm has the average values of Purity, F-measure and Accuracy (ACC) increased by 0.69, 71.74% and 98.5% respectively at most. Experimental results show that the proposed algorithm reduces the dependence on Preference and improves the clustering effect, especially the accuracy of clustering results for sparse datasets.

Key words: Affinity Propagation (AP) clustering, preference, law of universal gravitation, information entropy, Adaptive Boosting (AdaBoost) algorithm

中图分类号: