Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (5): 1472-1479.DOI: 10.11772/j.issn.1001-9081.2021030515
• Data science and technology • Previous Articles Next Articles
Received:
2021-04-06
Revised:
2021-07-09
Accepted:
2021-07-14
Online:
2022-06-11
Published:
2022-05-10
Contact:
Yan MA
About author:
DU Jie, born in 1996,M. S. candidate. Her research interestsinclude pattern recognition,image processing.Supported by:
通讯作者:
马燕
作者简介:
杜洁(1996—),女,浙江湖州人,硕士研究生,主要研究方向:模式识别、图像处理基金资助:
CLC Number:
Jie DU, Yan MA, Hui HUANG. Clustering algorithm based on local gravity and distance[J]. Journal of Computer Applications, 2022, 42(5): 1472-1479.
杜洁, 马燕, 黄慧. 基于局部引力和距离的聚类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1472-1479.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021030515
数据集 | 样本数 | 维度 | 类别数 |
---|---|---|---|
DS1 | 788 | 2 | 7 |
DS2 | 2 000 | 2 | 5 |
DS3 | 1 390 | 2 | 4 |
DS4 | 1 000 | 2 | 4 |
Wine | 178 | 13 | 3 |
SCADI | 70 | 205 | 7 |
Soybean | 47 | 35 | 4 |
Waveform | 5 000 | 21 | 3 |
Libras | 360 | 90 | 15 |
Statlog | 4 435 | 36 | 7 |
Tab. 1 Information of datasets
数据集 | 样本数 | 维度 | 类别数 |
---|---|---|---|
DS1 | 788 | 2 | 7 |
DS2 | 2 000 | 2 | 5 |
DS3 | 1 390 | 2 | 4 |
DS4 | 1 000 | 2 | 4 |
Wine | 178 | 13 | 3 |
SCADI | 70 | 205 | 7 |
Soybean | 47 | 35 | 4 |
Waveform | 5 000 | 21 | 3 |
Libras | 360 | 90 | 15 |
Statlog | 4 435 | 36 | 7 |
数据集 | LGC | CLA | LGDC | ||
---|---|---|---|---|---|
初始动量 | k | 中心阈值 | k | k | |
DS1 | 10 | 11 | 0.1 | 30 | 26 |
DS2 | 30 | 10 | 0.1 | 60 | 40 |
DS3 | 30 | 7 | 0.2 | 30 | 40 |
DS4 | 30 | 15 | 0 | 60 | 30 |
Wine | 10 | 5 | 0.5 | 14 | 14 |
SCADI | 8 | 7 | 0.5 | 7 | 7 |
Soybean | 5 | 5 | 0.5 | 10 | 5 |
Waveform | 40 | 25 | 0.1 | 60 | 40 |
Libras | 3 | 5 | 0.3 | 20 | 18 |
Statlog | 10 | 23 | 0.5 | 60 | 40 |
Tab. 2 Parameter settings of different algorithms on different datasets
数据集 | LGC | CLA | LGDC | ||
---|---|---|---|---|---|
初始动量 | k | 中心阈值 | k | k | |
DS1 | 10 | 11 | 0.1 | 30 | 26 |
DS2 | 30 | 10 | 0.1 | 60 | 40 |
DS3 | 30 | 7 | 0.2 | 30 | 40 |
DS4 | 30 | 15 | 0 | 60 | 30 |
Wine | 10 | 5 | 0.5 | 14 | 14 |
SCADI | 8 | 7 | 0.5 | 7 | 7 |
Soybean | 5 | 5 | 0.5 | 10 | 5 |
Waveform | 40 | 25 | 0.1 | 60 | 40 |
Libras | 3 | 5 | 0.3 | 20 | 18 |
Statlog | 10 | 23 | 0.5 | 60 | 40 |
聚类算法 | 指标 | Wine | SCADI | Soybean | Waveform | Libras | Statlog |
---|---|---|---|---|---|---|---|
K-means | RI | 0.709 8 | 0.807 6 | 0.922 9 | 0.667 3 | 0.904 3 | 0.830 7 |
ARI | 0.364 0 | 0.431 9 | 0.808 3 | 0.253 6 | 0.295 6 | 0.462 6 | |
FM | 0.587 7 | 0.559 0 | 0.863 3 | 0.503 9 | 0.349 1 | 0.568 3 | |
DPC | RI | 0.719 1 | 0.699 4 | 0.898 2 | 0.628 5 | 0.868 8 | 0.821 7 |
ARI | 0.371 5 | 0.189 1 | 0.725 1 | 0.189 7 | 0.261 6 | 0.479 1 | |
FM | 0.587 7 | 0.388 2 | 0.792 7 | 0.476 7 | 0.345 8 | 0.595 3 | |
GDPC | RI | 0.716 2 | 0.696 9 | 0.898 2 | 0.614 3 | 0.150 0 | 0.635 0 |
ARI | 0.457 6 | 0.354 4 | 0.725 1 | 0.239 4 | 0.001 0 | 0.169 4 | |
FM | 0.711 9 | 0.583 2 | 0.792 7 | 0.562 4 | 0.242 2 | 0.411 6 | |
LGC | RI | 0.718 5 | 0.754 5 | 0.842 7 | 0.333 3 | 0.757 4 | 0.754 7 |
ARI | 0.456 8 | 0.473 2 | 0.653 7 | 0.000 0 | 0.126 5 | 0.441 8 | |
FM | 0.706 6 | 0.664 1 | 0.783 9 | 0.577 3 | 0.265 5 | 0.627 3 | |
CLA | RI | 0.713 5 | 0.840 6 | 0.842 7 | 0.333 3 | 0.760 7 | 0.693 2 |
ARI | 0.453 6 | 0.606 7 | 0.653 7 | 0.000 0 | 0.169 0 | 0.304 6 | |
FM | 0.710 5 | 0.719 2 | 0.783 9 | 0.577 3 | 0.317 9 | 0.527 4 | |
LGDC | RI | 0.824 9 | 0.865 0 | 1.0000 | 0.697 8 | 0.906 5 | 0.836 2 |
ARI | 0.613 2 | 0.657 2 | 1.0000 | 0.341 7 | 0.344 6 | 0.502 0 | |
FM | 0.747 1 | 0.750 2 | 1.0000 | 0.575 8 | 0.399 5 | 0.606 3 |
Tab. 3 Result comparison of six clustering algorithms on real datasets
聚类算法 | 指标 | Wine | SCADI | Soybean | Waveform | Libras | Statlog |
---|---|---|---|---|---|---|---|
K-means | RI | 0.709 8 | 0.807 6 | 0.922 9 | 0.667 3 | 0.904 3 | 0.830 7 |
ARI | 0.364 0 | 0.431 9 | 0.808 3 | 0.253 6 | 0.295 6 | 0.462 6 | |
FM | 0.587 7 | 0.559 0 | 0.863 3 | 0.503 9 | 0.349 1 | 0.568 3 | |
DPC | RI | 0.719 1 | 0.699 4 | 0.898 2 | 0.628 5 | 0.868 8 | 0.821 7 |
ARI | 0.371 5 | 0.189 1 | 0.725 1 | 0.189 7 | 0.261 6 | 0.479 1 | |
FM | 0.587 7 | 0.388 2 | 0.792 7 | 0.476 7 | 0.345 8 | 0.595 3 | |
GDPC | RI | 0.716 2 | 0.696 9 | 0.898 2 | 0.614 3 | 0.150 0 | 0.635 0 |
ARI | 0.457 6 | 0.354 4 | 0.725 1 | 0.239 4 | 0.001 0 | 0.169 4 | |
FM | 0.711 9 | 0.583 2 | 0.792 7 | 0.562 4 | 0.242 2 | 0.411 6 | |
LGC | RI | 0.718 5 | 0.754 5 | 0.842 7 | 0.333 3 | 0.757 4 | 0.754 7 |
ARI | 0.456 8 | 0.473 2 | 0.653 7 | 0.000 0 | 0.126 5 | 0.441 8 | |
FM | 0.706 6 | 0.664 1 | 0.783 9 | 0.577 3 | 0.265 5 | 0.627 3 | |
CLA | RI | 0.713 5 | 0.840 6 | 0.842 7 | 0.333 3 | 0.760 7 | 0.693 2 |
ARI | 0.453 6 | 0.606 7 | 0.653 7 | 0.000 0 | 0.169 0 | 0.304 6 | |
FM | 0.710 5 | 0.719 2 | 0.783 9 | 0.577 3 | 0.317 9 | 0.527 4 | |
LGDC | RI | 0.824 9 | 0.865 0 | 1.0000 | 0.697 8 | 0.906 5 | 0.836 2 |
ARI | 0.613 2 | 0.657 2 | 1.0000 | 0.341 7 | 0.344 6 | 0.502 0 | |
FM | 0.747 1 | 0.750 2 | 1.0000 | 0.575 8 | 0.399 5 | 0.606 3 |
1 | WANG Y, MA Y, HUANG H. A neighborhood-based three-stage hierarchical clustering algorithm [J]. Multimedia Tools and Applications, 2021, 80: 32379-32407. 10.1007/s11042-021-11171-w |
2 | LV X B, MA Y, HE X F, et al. CciMST: a clustering algorithm based on minimum spanning tree and cluster centers. mathematical problems in engineering [J]. Mathematical Problems in Engineering, 2018, 2018: Article No.8451796. 10.1155/2018/8451796 |
3 | XIE W B, LEE Y L, WANG C, et al. Hierarchical clustering supported by reciprocal nearest neighbors [J]. Information Sciences, 2020, 527: 279-292. 10.1016/j.ins.2020.04.016 |
4 | FELDMAN D, SCHMIDT M, SOHLER C. Turning big data into tiny data: constant-size core sets for k-means, PCA and projective clustering [C]// Proceedings of the 2013 24th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia: SIAM, 2013:1434-1453. 10.1137/1.9781611973105.103 |
5 | MA Y, LIN H R, WANG Y, et al. A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint [J]. Information Sciences, 2021, 557: 194-219. 10.1016/j.ins.2020.12.016 |
6 | YANG J, MA Y, ZHANG X F, et al. An initialization method based on hybrid distance for k-means algorithm [J]. Neural Computation, 2017, 29(11): 3094-3117. 10.1162/neco_a_01014 |
7 | NG A Y, JORDAN M I, WEISS Y. On spectral clustering: analysis and an algorithm [C]// Proceedings of the 2001 14th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2001: 849-856. |
8 | SHEIKHOLESLAMI G, CHATTERJEE S, ZHANG A D. WaveCluster: a wavelet-based clustering approach for spatial data in very large databases [J]. The VLDB Journal, 2000, 8(3/4): 289-304. 10.1007/s007780050009 |
9 | 郭佳,韩李涛,孙宪龙,等.自动确定聚类中心的比较密度峰值聚类算法[J].计算机应用,2021,41(3):738-744. |
GUO J, HAN L T, SUN X L, et al. Comparative density peaks clustering algorithm with automatic determination of clustering center [J]. Journal of Computer Applications, 2021, 41(3): 738-744. | |
10 | ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise [C]// Proceedings of the 1996 2nd International Conference on Knowledge Discovery and Data Mining. Palo Alto: AAAI Press, 1996: 226-231. 10.1109/icde.1998.655795 |
11 | RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks [J]. Science, 2014, 344(6191):1492-1496. 10.1126/science.1242072 |
12 | 温晓芳,杨志翀,陈梅.数据点的密度引力聚类新算法[J].计算机科学与探索,2018,12(12):1996-2006. |
WEN X F, YANG Z C, CHEN M. Density attraction clustering algorithm between data points [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(12): 1996-2006. | |
13 | WANG Z Q, YU Z W, CHEN C L P, et al. Clustering by local gravitation [J]. IEEE Transactions on Cybernetics, 2018, 48(5): 1383-1396. 10.1109/tcyb.2017.2695218 |
14 | LIU R, WANG H, YU X M. Shared-nearest-neighbor-based clustering by fast search and find of density peaks [J]. Information Sciences, 2018, 450:200-226. 10.1016/j.ins.2018.03.031 |
15 | 吴斌,卢红丽,江惠君.自适应密度峰值聚类算法[J].计算机应用,2020,40(6):1654-1661. |
WU B, LU H L, JIANG H J. Adaptive density peaks clustering algorithm [J]. Journal of Computer Applications, 2020, 40(6): 1654-1661. | |
16 | JIANG J H, HAO D H, CHEN Y J, et al. GDPC: Gravitation-based Density Peaks Clustering algorithm [J]. Physica A: Statistical Mechanics and its Applications, 2018, 502: 345-355. 10.1016/j.physa.2018.02.084 |
17 | WRIGHT W E. Gravitational clustering [J]. Pattern Recognition, 1977, 9(3): 151-166. 10.1016/0031-3203(77)90013-9 |
18 | WANG X X, ZHANG Y F, XIE J, et al. A density-core-based clustering algorithm with local resultant force [J]. Soft Computing, 2020, 24(9):6571-6590. 10.1007/s00500-020-04777-z |
19 | BRYANT A, CIOS K. RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates [J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6): 1109-1121. 10.1109/tkde.2017.2787640 |
20 | 魏康园,何庆,徐钦帅.基于改进引力搜索算法的K-means聚类[J].计算机应用研究,2019,36(11):3240-3244. |
WEI K Y, HE Q, XU Q S. Novel K-means clustering algorithm based on improved gravitational search algorithm [J]. Application Research of Computers, 2019, 36(11): 3240-3244. | |
21 | 孙伟鹏.基于密度峰值聚类算法的研究与实现[D].无锡:江南大学,2018:39-50. |
SUN W P. Research and implementation of density peaks clustering algorithm [D]. Wuxi: Jiangnan University, 2018: 39-50. | |
22 | DUA D, GRAFF C. UCI machine learning repository [DS/OL]. [2020-11-05]. . |
23 | YEUNG K Y, RUZZO W L. Principal component analysis for clustering gene expression data [J]. Bioinformatics, 2001, 17(9): 763-774. 10.1093/bioinformatics/17.9.763 |
24 | HUANG Y P J, POWERS R, MONTELIONE G T. Protein NMR Recall, Precision and F-measure Scores (RPF Scores): structure quality assessment measures based on information retrieval statistics [J]. Journal of the American Chemical Society, 2005, 127(6): 1665-1674. 10.1021/ja047109h |
25 | 蒋礼青,张明新,郑金龙,等.快速搜索与发现密度峰值聚类算法的优化研究[J].计算机应用研究,2016,33(11):3251-3254. 10.1109/icalip.2016.7846664 |
JIANG L Q, ZHANG M X, ZHENG J L, et al. Optimization of clustering by fast search and find of density peaks [J]. Application Research of Computers, 2016, 33(11): 3251- 3254. 10.1109/icalip.2016.7846664 | |
26 | MEHMOOD R, ZHANG G Z, BIE R F, et al. Clustering by fast search and find of density peaks via heat diffusion [J]. Neurocomputing, 2016, 208: 210-217. 10.1016/j.neucom.2016.01.102 |
[1] | Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587. |
[2] | Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242. |
[3] | Junchi GE, Weihua ZHAO. Distance weighted discriminant analysis based on robust principal component analysis for matrix data [J]. Journal of Computer Applications, 2024, 44(7): 2073-2079. |
[4] | Han SHEN, Zhongsheng WANG, Zhou ZHOU, Changyuan WANG. Improved DV-Hop localization model based on multi-scenario [J]. Journal of Computer Applications, 2024, 44(4): 1219-1227. |
[5] | Lin SUN, Menghan LIU. K-means clustering based on adaptive cuckoo optimization feature selection [J]. Journal of Computer Applications, 2024, 44(3): 831-841. |
[6] | Peng PENG, Zhiwei NI, Xuhui ZHU, Qian CHEN. Interference trajectory publication based on improved glowworm swarm algorithm and differential privacy [J]. Journal of Computer Applications, 2024, 44(2): 496-503. |
[7] | Jing ZHONG, Chen LIN, Zhiwei SHENG, Shibin ZHANG. Quantum K-Means algorithm based on Hamming distance [J]. Journal of Computer Applications, 2023, 43(8): 2493-2498. |
[8] | Zhenyu LIU, Chaokun WANG, Gaoyang GUO. Parallel algorithm of betweenness centrality for dynamic networks [J]. Journal of Computer Applications, 2023, 43(7): 1987-1993. |
[9] | Ke FANG, Rong LIU, Chiyu WEI, Xinyue ZHANG, Yang LIU. Pedestrian fall detection algorithm in complex scenes [J]. Journal of Computer Applications, 2023, 43(6): 1811-1817. |
[10] | Jianqiang LIU, Yepin QU, Yuhai LYU. Design of very short antipollution error correcting code based on global distance optimization [J]. Journal of Computer Applications, 2023, 43(2): 630-635. |
[11] | Feng XIANG, Zhongzhi LI, Xi XIONG, Binyong LI. Inverse distance weight interpolation algorithm based on particle swarm local optimization [J]. Journal of Computer Applications, 2023, 43(2): 385-390. |
[12] | Haiyong ZHANG, Xianjin FANG, Enwan ZHANG, Baoyu LI, Chao PENG, Jianxiang MU. Fingerprint positioning method based on measurement report signal clustering [J]. Journal of Computer Applications, 2023, 43(12): 3947-3954. |
[13] | Zhuangzhuang XUE, Peng LI, Weibei FAN, Hongjun ZHANG, Fanshuo MENG. Multiple clustering algorithm based on dynamic weighted tensor distance [J]. Journal of Computer Applications, 2023, 43(11): 3449-3456. |
[14] | Yijian ZHAO, Li LIN, Qianqian WANG, Peng WEN, Dong YANG. Trajectory prediction of sea targets based on geodetic distance similarity calculation [J]. Journal of Computer Applications, 2023, 43(11): 3594-3598. |
[15] | LIU Hui, MA Xiang, ZHANG Linyu, HE Rujin. Aspect-based sentiment analysis model integrating match-LSTM network and grammatical distance [J]. Journal of Computer Applications, 2023, 43(1): 45-50. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||