基于万有引力的自适应近邻传播聚类算法

doi:10.11772/j.issn.1001-9081.2020071130

计算机应用 ›› 2021, Vol. 41 ›› Issue (5): 1337-1342.DOI: 10.11772/j.issn.1001-9081.2020071130

所属专题：数据科学与技术

基于万有引力的自适应近邻传播聚类算法

王治和, 常筱卿, 杜辉

西北师范大学计算机科学与工程学院, 兰州 730070

收稿日期:2020-07-31 修回日期:2020-09-25 出版日期:2021-05-10 发布日期:2020-10-19
通讯作者: 常筱卿
作者简介:王治和(1965-),男,甘肃武威人,教授,博士,主要研究方向:数据挖掘;常筱卿(1992-),女,甘肃白银人,硕士研究生,主要研究方向:数据挖掘、智能计算;杜辉(1976-),女,甘肃兰州人,副教授,博士,主要研究方向:数据挖掘、智能计算。
基金资助:
国家自然科学基金资助项目（61962054）。

Adaptive affinity propagation clustering algorithm based on universal gravitation

WANG Zhihe, CHANG Xiaoqing, DU Hui

College of Computer Science and Engineering, Northwest Normal University, Lanzhou Gansu 730070, China

Received:2020-07-31 Revised:2020-09-25 Online:2021-05-10 Published:2020-10-19
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61962054).

摘要/Abstract

摘要： 针对近邻传播（AP）聚类算法对参数偏向参数（Preference）敏感、不适用于稀疏数据、聚类结果中会出现错误聚类的样本点的问题，提出基于万有引力的自适应近邻传播聚类（GA-AP）算法。首先，在传统AP算法的基础上采用引力搜索机制对样本进行全局寻优；其次，在全局寻优的基础上利用信息熵和自适应增强（AdaBoost）算法找到每个簇内正确聚类和错误聚类的样本点，并计算出这些样本点的权值，用计算出的权值更新对应的样本点，从而更新相似度、Preference取值、吸引度和隶属度，并进行重新聚类。不断操作以上步骤直到达到最大的迭代次数。通过在9个数据集上的仿真实验得出，相比于基于自适应属性加权的近邻传播聚类（AFW_AP）算法、AP算法、K均值聚类（K-means）算法和模糊C均值（FCM）算法，所提算法的纯度（Purity）、F值（F-measure）和准确率（ACC）的平均值分别最高提升了0.69、71.74%和98.5%。实验结果表明，所提算法降低了对偏向参数的依赖，提高了聚类效果，特别是对于稀疏数据集的聚类结果的准确率。

关键词: 近邻传播聚类, 偏向参数, 万有引力定律, 信息熵, 自适应增强算法

Abstract: Focused on the problem that Affinity Propagation (AP) clustering algorithm is sensitive to parameter Preference, which is not suitable for sparse data, and has the incorrectly clustered sample points in the clustering results, an algorithm named Adaptive Affinity Propagation clustering based on universal gravitation (GA-AP) was proposed. Firstly, the gravitational search mechanism was introduced into the traditional AP algorithm in order to perform the global optimization to the sample points. Secondly, on the basis of global optimization, the correctly clustered and incorrectly clustered sample points in each cluster were found through the information entropy and Adaptive Boosting (AdaBoost) algorithm, the weights of the sample points were calculated. Each sample point was updated by the corresponding weight, so that the similarity, Preference value, attractiveness and membership degree were updated, and the re-clustering was performed. The above steps were continuously operated until the maximum number of iterations was reached. Through simulation experiments on nine datasets, it can be seen that compared to Affinity Propagation clustering based on Adaptive Attribute Weighting (AFW_AP) algorithm, AP algorithm, K-means clustering (K-means) algorithm and Fuzzy C-Means (FCM) algorithm, the proposed algorithm has the average values of Purity, F-measure and Accuracy (ACC) increased by 0.69, 71.74% and 98.5% respectively at most. Experimental results show that the proposed algorithm reduces the dependence on Preference and improves the clustering effect, especially the accuracy of clustering results for sparse datasets.

Key words: Affinity Propagation (AP) clustering, preference, law of universal gravitation, information entropy, Adaptive Boosting (AdaBoost) algorithm

中图分类号:

TP311.1

王治和, 常筱卿, 杜辉. 基于万有引力的自适应近邻传播聚类算法[J]. 计算机应用, 2021, 41(5): 1337-1342.

WANG Zhihe, CHANG Xiaoqing, DU Hui. Adaptive affinity propagation clustering algorithm based on universal gravitation[J]. Journal of Computer Applications, 2021, 41(5): 1337-1342.

参考文献

[1] FREY B J,DUECK D,et al. Clustering by passing messages between data points[J]. Science,2007,315(5814):972-976.
[2] 李海林, 魏苗. 自适应属性加权近邻传播聚类算法[J]. 电子科技大学学报,2018,47(2):247-255.(LI H L,WEI M. Affinity propagation clustering algorithm based on adaptive feature weight[J]. Journal of University of Electronic Science and Technology of China,2018,47(2):247-255.)
[3] 徐明亮, 王士同, 杭文龙. 一种基于同类约束的半监督近邻反射传播聚类方法[J]. 自动化学报,2016,42(2):255-269.(XU M L, WANG S T, HANG W L. A semi-supervised affinity propagation clustering method with homogeneity constraint[J]. Acta Automatica Sinica,2016,42(2):255-269.)
[4] 王卫涛, 钱雪忠, 曹文彬. 自适应参数调整的近邻传播聚类算法[J]. 小型微型计算机系统,2018,39(6):1305-1311.(WANG W T,QIAN X Z,CAO W B. Affinity propagation clustering algorithm for adaptive parameter adjustment[J]. Journal of Chinese Computer Systems,2018,39(6):1305-1311.)
[5] 覃华, 詹娟娟, 苏一丹. 基于概率无向图模型的近邻传播聚类算法[J]. 控制与决策,2017,32(10):1796-1802.(QIN H,ZHAN J J,SU Y D. Affinity propagation clustering algorithm based on probabilistic undirected graphical model[J]. Control and Decision, 2017,32(10):1796-1802.)
[6] 胡久松, 刘宏立, 肖郭璇, 等. 应用于WiFi室内定位的自适应仿射传播聚类算法[J]. 电子与信息学报,2018,40(12):2889-2895.(HU J S,LIU H L,XIAO G X,et al. Adaptive affinity propagation clustering algorithm for WiFi indoor positioning[J]. Journal of Electronics and Information Technology,2018,40(12):2889-2895.)
[7] 肖辉辉, 万常选, 段艳明, 等. 基于引力搜索机制的花朵授粉算法[J]. 自动化学报,2017,43(4):576-594.(XIAO H H,WAN C X,DUAN Y M,et al. Flower pollination algorithm based on gravity search mechanism[J]. Acta Automatica Sinica,2017,43(4):576-594.)
[8] FRENAY B,VERLEYSEN M. Classification in the presence of label noise:a survey[J]. IEEE Transactions on Neural Networks and Learning Systems,2014,25(5):845-869.
[9] XIA S,LIU Y,DING X,et al. Granular ball computing classifiers for efficient, scalable and robust learning[J]. Information Sciences,2019,483:136-152.
[10] 朱映波, 赵阳洋, 王佩, 等. 融合马尔科夫决策过程与信息熵的对话策略[J]. 计算机工程, 2021,47(3):284-290. (ZHU Y B, ZHAO Y Y,WANG P,et al. Dialogue strategy integrating Markov decision process and information entropy[J]. Computer Engineering,2021,47(3):284-290.)
[11] XIA S,WANG G,CHEN Z,et al. Complete random forest based class noise filtering learning for improving the generalizability of classifiers[J]. IEEE Transactions on Knowledge and Data Engineering,2019,31(11):2063-2078.
[12] 张武, 张嫚嫚, 洪汛, 等. 基于近邻传播算法的茶园土壤墒情传感器布局优化[J]. 农业工程学报,2019,35(6):107-113. (ZHANG W,ZHANG M M,HONG X,et al. Layout optimization of soil moisture sensor in tea garden based on nearest neighbor propagation algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering,2019,35(6):107-113.)
[13] 程仙国, 王明军. 融合SLIC与改进邻近传播聚类的彩色图像分割算法[J]. 计算机工程,2018,44(6):226-232.(CHENG X G,WANG M J. Color image segmentation algorithm combining SLIC with improved affinity propagation clustering[J]. Computer Engineering,2018,44(6):226-232.)
[14] 李艳琼, 李冬冬, 王喆, 等. 万有引力近邻的多视角分类学习[J]. 计算机工程与应用,2019,55(17):137-142,179.(LI Y Q,LI D D,WANG Z,et al. Multi-view learning with gravitational nearest neighbor classifier[J]. Computer Engineering and Applications,2019,55(17):137-142,179.)
[15] 刘建平, 邬俊飞, 黎卿, 等. 万有引力常数G精确测量实验进展[J]. 物理学报,2018,67(16):No. 160603. (LIU J P,WU J F, LI Q,et al. Progress on the precision measurement of the Newtonian gravitational constant G[J]. Acta Physica Sinica, 2018,67(16):No. 160603.)
[16] SUN C, WANG Y, SUN G. A multi-criteria fusion feature selection algorithm for fault diagnosis of helicopter planetary gear train[J]. Chinese Journal of Aeronautics,2020,33(5):1549-1561.
[17] ZHANG Y, LIU Z, ZHOU W. Biomedical named entity recognition based on self-supervised deep belief network[J]. Chinese Journal of Electronics,2020,29(3):455-462.
[18] SHANG F,JIAO L,SHI J,et al. Fast affinity propagation clustering:a multilevel approach[J]. Pattern Recognition,2012, 45(1):474-486.

基于万有引力的自适应近邻传播聚类算法

Adaptive affinity propagation clustering algorithm based on universal gravitation

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	袁园, 吴文, 万毅. 基于熵驱动域适应学习的单幅图像阴影检测方法[J]. 计算机应用, 2020, 40(7): 2131-2136.
[2]	陈程军, 毛莺池, 王绎超. 基于激活-熵的分层迭代剪枝策略的CNN模型压缩[J]. 计算机应用, 2020, 40(5): 1260-1265.
[3]	张伍, 陈红梅. 基于多核模糊粗糙集与蝗虫优化算法的高光谱波段选择[J]. 计算机应用, 2020, 40(5): 1425-1430.
[4]	童玉珍, 王应明. 基于后悔理论及EDAS法的概率语言多属性群决策方法[J]. 计算机应用, 2020, 40(11): 3152-3158.
[5]	黄永鑫, 唐雪飞. 基于近邻传播聚类和TANE算法的高校数据中函数依赖的发现[J]. 计算机应用, 2020, 40(1): 90-95.
[6]	张伍, 陈红梅. 基于核模糊粗糙集的高光谱波段选择算法[J]. 计算机应用, 2020, 40(1): 258-263.
[7]	丁莲静, 刘光帅, 李旭瑞, 陈晓文. 加权信息熵与增强局部二值模式结合的人脸识别[J]. 计算机应用, 2019, 39(8): 2210-2216.
[8]	毛莺池, 曹海, 平萍, 李晓芳. 基于最大联合条件互信息的特征选择[J]. 计算机应用, 2019, 39(3): 734-741.
[9]	汪逸飞, 罗永龙, 俞庆英, 刘晴晴, 陈文. 基于信息熵抑制的轨迹隐私保护方法[J]. 计算机应用, 2018, 38(11): 3252-3257.
[10]	丁飞飞, 杨文元. 信息熵约束下的视频目标分割[J]. 计算机应用, 2018, 38(10): 2782-2787.
[11]	纪连恩, 张笑林, 梁适宜, 王斌. 面向大规模地震体的多切片实时交互绘制优化[J]. 计算机应用, 2017, 37(9): 2621-2625.
[12]	吴铮, 于洪涛, 刘树新, 朱宇航. 基于信息熵的跨社交网络用户身份识别方法[J]. 计算机应用, 2017, 37(8): 2374-2380.
[13]	张晶, 陈垚, 范洪博, 孙俊. 基于信息物理融合系统执行器输出事件的价值评价调度策略[J]. 计算机应用, 2017, 37(6): 1663-1669.
[14]	刘江冬, 梁刚, 冯程, 周泓宇. 基于信息熵和时效性的协同过滤推荐[J]. 计算机应用, 2016, 36(9): 2531-2534.
[15]	王佩瑶, 曹江涛, 姬晓飞. 基于改进时空兴趣点特征的双人交互行为识别[J]. 计算机应用, 2016, 36(10): 2875-2879.