Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (5): 1527-1538.DOI: 10.11772/j.issn.1001-9081.2023050727
Special Issue: 数据科学与技术
• Data science and technology • Previous Articles Next Articles
Tongtong XU1, Bin XIE1,2,3(), Chunhao ZHANG1, Ximei ZHANG1
Received:
2023-06-06
Revised:
2023-07-09
Accepted:
2023-07-14
Online:
2023-08-01
Published:
2024-05-10
Contact:
Bin XIE
About author:
XU Tongtong, born in 1998, M. S. candidate. Her research interests include machine learning, cyber security.Supported by:
通讯作者:
解滨
作者简介:
徐童童(1998—),女,河北衡水人,硕士研究生,CCF会员,主要研究方向:机器学习、网络安全基金资助:
CLC Number:
Tongtong XU, Bin XIE, Chunhao ZHANG, Ximei ZHANG. Multi-order nearest neighbor graph clustering algorithm by fusing transition probability matrix[J]. Journal of Computer Applications, 2024, 44(5): 1527-1538.
徐童童, 解滨, 张春昊, 张喜梅. 融合转移概率矩阵的多阶最近邻图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1527-1538.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023050727
算法 | 时间复杂度 | 算法 | 时间复杂度 |
---|---|---|---|
K-means[ | SC[ | ||
DPC[ | CTCEHC[ | ||
SNN-DPC[ | LDP-SC[ | ||
AGNES[ | MNNGC |
Tab.1 Time complexity comparison among different algorithms
算法 | 时间复杂度 | 算法 | 时间复杂度 |
---|---|---|---|
K-means[ | SC[ | ||
DPC[ | CTCEHC[ | ||
SNN-DPC[ | LDP-SC[ | ||
AGNES[ | MNNGC |
数据集 | 实例数 | 属性数 | 类数 | ||
---|---|---|---|---|---|
名称 | 简称 | ||||
合成 数据集 | Jain[ | s1 | 373 | 2 | 2 |
Spiral[ | s2 | 312 | 2 | 3 | |
R15[ | s3 | 600 | 2 | 15 | |
4k2_far[ | s4 | 400 | 2 | 4 | |
Circle[ | s5 | 299 | 2 | 3 | |
Happy[ | s6 | 266 | 2 | 3 | |
Compound[ | s7 | 399 | 2 | 6 | |
PathBased[ | s8 | 300 | 2 | 3 | |
UCI 数据集 | Glass | u1 | 214 | 9 | 6 |
Ionosphere | u2 | 351 | 33 | 2 | |
Knowledge | u3 | 172 | 5 | 4 | |
Planning | u4 | 182 | 12 | 2 | |
Flags | u5 | 194 | 28 | 8 | |
Haberman-normal | u6 | 306 | 3 | 2 | |
Balance-scale | u7 | 625 | 4 | 3 | |
Bupa | u8 | 345 | 6 | 2 | |
Breast-cancer | u9 | 286 | 9 | 2 | |
Ecoli | u10 | 336 | 7 | 8 |
Tab. 2 Detailed information of synthetic and UCI datasets
数据集 | 实例数 | 属性数 | 类数 | ||
---|---|---|---|---|---|
名称 | 简称 | ||||
合成 数据集 | Jain[ | s1 | 373 | 2 | 2 |
Spiral[ | s2 | 312 | 2 | 3 | |
R15[ | s3 | 600 | 2 | 15 | |
4k2_far[ | s4 | 400 | 2 | 4 | |
Circle[ | s5 | 299 | 2 | 3 | |
Happy[ | s6 | 266 | 2 | 3 | |
Compound[ | s7 | 399 | 2 | 6 | |
PathBased[ | s8 | 300 | 2 | 3 | |
UCI 数据集 | Glass | u1 | 214 | 9 | 6 |
Ionosphere | u2 | 351 | 33 | 2 | |
Knowledge | u3 | 172 | 5 | 4 | |
Planning | u4 | 182 | 12 | 2 | |
Flags | u5 | 194 | 28 | 8 | |
Haberman-normal | u6 | 306 | 3 | 2 | |
Balance-scale | u7 | 625 | 4 | 3 | |
Bupa | u8 | 345 | 6 | 2 | |
Breast-cancer | u9 | 286 | 9 | 2 | |
Ecoli | u10 | 336 | 7 | 8 |
算法 | 参数 | 取值范围 | 算法 | 参数 | 取值范围 |
---|---|---|---|---|---|
AGNES | c | — | SC | c | — |
DPC | (0.5%,2%) | CTCEHC | c | — | |
K-means | c | 随机选取 | LDP-SC | k/c | (3,35)/— |
SNN-DPC | k | (3,35) | MNNGC | k/c | (3,35)/— |
Tab. 3 Parameter settings for each algorithm
算法 | 参数 | 取值范围 | 算法 | 参数 | 取值范围 |
---|---|---|---|---|---|
AGNES | c | — | SC | c | — |
DPC | (0.5%,2%) | CTCEHC | c | — | |
K-means | c | 随机选取 | LDP-SC | k/c | (3,35)/— |
SNN-DPC | k | (3,35) | MNNGC | k/c | (3,35)/— |
算法 | Jain | R15 | Spiral | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | |
AGNES | — | 2 | — | 0.946 | 0.696 | 0.779 | 0.922 | — | 15 | — | 0.995 | 0.992 | 0.989 | 0.990 | — | 3 | — | 0.369 | -0.003 | -0.002 | 0.357 |
DPC | 2% | — | — | 0.740 | 0.618 | 0.716 | 0.882 | 2% | — | — | 0.997 | 0.994 | 0.993 | 0.993 | 2% | — | — | 1.000 | 1.000 | 1.000 | 1.000 |
K‑means | — | 2 | — | 0.786 | 0.368 | 0.324 | 0.701 | — | 15 | — | 0.927 | 0.956 | 0.898 | 0.906 | — | 3 | — | 0.349 | -0.005 | -0.006 | 0.328 |
SNN‑DPC | — | — | 12 | 1.000 | 1.000 | 1.000 | 1.000 | — | — | 10 | 0.997 | 0.994 | 0.993 | 0.993 | — | — | 8 | 1.000 | 1.000 | 1.000 | 1.000 |
SC | — | 2 | — | 1.000 | 1.000 | 1.000 | 1.000 | — | 15 | — | 0.996 | 0.994 | 0.993 | 0.993 | — | 3 | — | 1.000 | 1.000 | 1.000 | 1.000 |
CTCEHC | — | 2 | — | 1.000 | 1.000 | 1.000 | 1.000 | — | 15 | — | 0.980 | 0.972 | 0.959 | 0.961 | — | 3 | — | 0.676 | 0.440 | 0.270 | 0.573 |
LDP‑SC | — | 2 | 12 | 1.000 | 1.000 | 1.000 | 1.000 | — | 15 | 12 | 0.997 | 0.994 | 0.993 | 0.993 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 |
MNNGC | — | 2 | 8 | 1.000 | 1.000 | 1.000 | 1.000 | — | 15 | 3 | 0.997 | 0.988 | 0.987 | 0.988 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 |
算法 | 4k2_far | Circle | Happy | ||||||||||||||||||
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | |
AGNES | — | 4 | — | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | — | 0.649 | 0.297 | 0.164 | 0.589 | — | 3 | — | 0.748 | 0.554 | 0.429 | 0.633 |
DPC | 2% | — | — | 1.000 | 1.000 | 1.000 | 1.000 | 2% | — | — | 0.452 | 0.138 | 0.049 | 0.466 | 2% | — | — | 0.797 | 0.584 | 0.488 | 0.665 |
K‑means | — | 4 | — | 0.873 | 0.783 | 0.624 | 0.744 | — | 3 | — | 0.482 | 0.157 | 0.053 | 0.403 | — | 3 | — | 0.793 | 0.580 | 0.481 | 0.661 |
SNN‑DPC | — | — | 17 | 0.998 | 0.990 | 0.997 | 0.998 | — | — | 33 | 1.000 | 1.000 | 1.000 | 1.000 | — | — | 10 | 1.000 | 1.000 | 1.000 | 1.000 |
SC | — | 4 | — | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | — | 0.632 | 0.332 | 0.199 | 0.599 | — | 3 | — | 0.828 | 0.568 | 0.462 | 0.650 |
CTCEHC | — | 4 | — | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | — | 0.692 | 0.378 | 0.253 | 0.619 | — | 3 | — | 0.771 | 0.558 | 0.445 | 0.640 |
LDP‑SC | — | 4 | 12 | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 |
MNNGC | — | 4 | 6 | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 |
算法 | Compound | PathBased | |||||||||||||||||||
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | ||||||||
AGNES | — | 6 | — | 0.875 | 0.831 | 0.803 | 0.862 | — | 3 | — | 0.730 | 0.518 | 0.444 | 0.653 | |||||||
DPC | 2% | — | — | 0.832 | 0.857 | 0.783 | 0.855 | 2% | — | — | 0.733 | 0.536 | 0.453 | 0.659 | |||||||
K‑means | — | 6 | — | 0.832 | 0.715 | 0.509 | 0.621 | — | 3 | — | 0.743 | 0.543 | 0.462 | 0.662 | |||||||
SNN‑DPC | — | — | 17 | 0.870 | 0.897 | 0.850 | 0.896 | — | — | 12 | 0.833 | 0.591 | 0.575 | 0.720 | |||||||
SC | — | 6 | — | 0.809 | 0.763 | 0.531 | 0.640 | — | 3 | — | 0.808 | 0.765 | 0.684 | 0.792 | |||||||
CTCEHC | — | 6 | — | 0.729 | 0.643 | 0.448 | 0.572 | — | 3 | — | 0.560 | 0.337 | 0.150 | 0.540 | |||||||
LDP‑SC | — | 6 | 12 | 0.614 | 0.728 | 0.546 | 0.649 | — | 3 | 8 | 0.797 | 0.594 | 0.528 | 0.695 | |||||||
MNNGC | — | 6 | 9 | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | 5 | 0.893 | 0.758 | 0.729 | 0.825 |
Tab. 4 Performance comparison of clustering algorithms on synthetic datasets
算法 | Jain | R15 | Spiral | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | |
AGNES | — | 2 | — | 0.946 | 0.696 | 0.779 | 0.922 | — | 15 | — | 0.995 | 0.992 | 0.989 | 0.990 | — | 3 | — | 0.369 | -0.003 | -0.002 | 0.357 |
DPC | 2% | — | — | 0.740 | 0.618 | 0.716 | 0.882 | 2% | — | — | 0.997 | 0.994 | 0.993 | 0.993 | 2% | — | — | 1.000 | 1.000 | 1.000 | 1.000 |
K‑means | — | 2 | — | 0.786 | 0.368 | 0.324 | 0.701 | — | 15 | — | 0.927 | 0.956 | 0.898 | 0.906 | — | 3 | — | 0.349 | -0.005 | -0.006 | 0.328 |
SNN‑DPC | — | — | 12 | 1.000 | 1.000 | 1.000 | 1.000 | — | — | 10 | 0.997 | 0.994 | 0.993 | 0.993 | — | — | 8 | 1.000 | 1.000 | 1.000 | 1.000 |
SC | — | 2 | — | 1.000 | 1.000 | 1.000 | 1.000 | — | 15 | — | 0.996 | 0.994 | 0.993 | 0.993 | — | 3 | — | 1.000 | 1.000 | 1.000 | 1.000 |
CTCEHC | — | 2 | — | 1.000 | 1.000 | 1.000 | 1.000 | — | 15 | — | 0.980 | 0.972 | 0.959 | 0.961 | — | 3 | — | 0.676 | 0.440 | 0.270 | 0.573 |
LDP‑SC | — | 2 | 12 | 1.000 | 1.000 | 1.000 | 1.000 | — | 15 | 12 | 0.997 | 0.994 | 0.993 | 0.993 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 |
MNNGC | — | 2 | 8 | 1.000 | 1.000 | 1.000 | 1.000 | — | 15 | 3 | 0.997 | 0.988 | 0.987 | 0.988 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 |
算法 | 4k2_far | Circle | Happy | ||||||||||||||||||
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | |
AGNES | — | 4 | — | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | — | 0.649 | 0.297 | 0.164 | 0.589 | — | 3 | — | 0.748 | 0.554 | 0.429 | 0.633 |
DPC | 2% | — | — | 1.000 | 1.000 | 1.000 | 1.000 | 2% | — | — | 0.452 | 0.138 | 0.049 | 0.466 | 2% | — | — | 0.797 | 0.584 | 0.488 | 0.665 |
K‑means | — | 4 | — | 0.873 | 0.783 | 0.624 | 0.744 | — | 3 | — | 0.482 | 0.157 | 0.053 | 0.403 | — | 3 | — | 0.793 | 0.580 | 0.481 | 0.661 |
SNN‑DPC | — | — | 17 | 0.998 | 0.990 | 0.997 | 0.998 | — | — | 33 | 1.000 | 1.000 | 1.000 | 1.000 | — | — | 10 | 1.000 | 1.000 | 1.000 | 1.000 |
SC | — | 4 | — | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | — | 0.632 | 0.332 | 0.199 | 0.599 | — | 3 | — | 0.828 | 0.568 | 0.462 | 0.650 |
CTCEHC | — | 4 | — | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | — | 0.692 | 0.378 | 0.253 | 0.619 | — | 3 | — | 0.771 | 0.558 | 0.445 | 0.640 |
LDP‑SC | — | 4 | 12 | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 |
MNNGC | — | 4 | 6 | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | 8 | 1.000 | 1.000 | 1.000 | 1.000 |
算法 | Compound | PathBased | |||||||||||||||||||
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | ||||||||
AGNES | — | 6 | — | 0.875 | 0.831 | 0.803 | 0.862 | — | 3 | — | 0.730 | 0.518 | 0.444 | 0.653 | |||||||
DPC | 2% | — | — | 0.832 | 0.857 | 0.783 | 0.855 | 2% | — | — | 0.733 | 0.536 | 0.453 | 0.659 | |||||||
K‑means | — | 6 | — | 0.832 | 0.715 | 0.509 | 0.621 | — | 3 | — | 0.743 | 0.543 | 0.462 | 0.662 | |||||||
SNN‑DPC | — | — | 17 | 0.870 | 0.897 | 0.850 | 0.896 | — | — | 12 | 0.833 | 0.591 | 0.575 | 0.720 | |||||||
SC | — | 6 | — | 0.809 | 0.763 | 0.531 | 0.640 | — | 3 | — | 0.808 | 0.765 | 0.684 | 0.792 | |||||||
CTCEHC | — | 6 | — | 0.729 | 0.643 | 0.448 | 0.572 | — | 3 | — | 0.560 | 0.337 | 0.150 | 0.540 | |||||||
LDP‑SC | — | 6 | 12 | 0.614 | 0.728 | 0.546 | 0.649 | — | 3 | 8 | 0.797 | 0.594 | 0.528 | 0.695 | |||||||
MNNGC | — | 6 | 9 | 1.000 | 1.000 | 1.000 | 1.000 | — | 3 | 5 | 0.893 | 0.758 | 0.729 | 0.825 |
算法 | Glass | Ionosphere | Knowledge | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | |
AGNES | — | 6 | — | 0.383 | 0.067 | 0.02 | 0.493 | — | 2 | — | 0.644 | 0.003 | 0.005 | 0.733 | — | 4 | — | 0.616 | 0.303 | 0.267 | 0.513 |
DPC | 2% | — | — | 0.477 | 0.232 | 0.155 | 0.418 | 1% | — | — | 0.678 | 0.082 | 0.123 | 0.582 | 2% | — | — | 0.512 | 0.112 | 0.129 | 0.462 |
K-means | — | 6 | — | 0.495 | 0.253 | 0.165 | 0.373 | — | 2 | — | 0.709 | 0.130 | 0.173 | 0.603 | — | 4 | — | 0.523 | 0.223 | 0.144 | 0.375 |
SNN-DPC | — | — | 8 | 0.495 | 0.274 | 0.221 | 0.406 | — | — | 8 | 0.641 | 0.000 | 0.000 | 0.734 | — | — | 7 | 0.529 | 0.189 | 0.169 | 0.478 |
SC | — | 6 | — | 0.391 | 0.124 | 0.054 | 0.494 | — | 2 | — | 0.597 | 0.084 | -0.047 | 0.637 | — | 4 | — | 0.477 | 0.091 | 0.073 | 0.320 |
CTCEHC | — | 6 | — | 0.608 | 0.321 | 0.215 | 0.420 | — | 2 | — | 0.735 | 0.110 | 0.129 | 0.547 | — | 4 | — | 0.541 | 0.159 | 0.120 | 0.371 |
LDP-SC | — | 6 | 8 | 0.421 | 0.313 | 0.200 | 0.395 | — | 2 | 8 | 0.672 | 0.072 | 0.115 | 0.580 | — | 4 | 12 | 0.529 | 0.186 | 0.201 | 0.446 |
MNNGC | — | 6 | 6 | 0.514 | 0.370 | 0.234 | 0.554 | — | 2 | 5 | 0.725 | 0.135 | 0.188 | 0.700 | — | 4 | 4 | 0.620 | 0.366 | 0.301 | 0.513 |
算法 | Planning | Flags | Haberman-normal | ||||||||||||||||||
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | |
AGNES | — | 2 | — | 0.714 | -0.005 | -0.007 | 0.762 | — | 8 | — | 0.356 | 0.029 | 0.007 | 0.408 | — | 2 | — | 0.735 | -0.061 | 0.003 | 0.774 |
DPC | 2% | — | — | 0.462 | -0.004 | -0.005 | 0.549 | 2% | — | — | 0.340 | 0.037 | 0.052 | 0.313 | 1% | — | — | 0.533 | -0.003 | -0.002 | 0.555 |
K-means | — | 2 | — | 0.714 | -0.004 | -0.004 | 0.539 | — | 8 | — | 0.412 | 0.076 | 0.136 | 0.323 | — | 2 | — | 0.735 | -0.002 | -0.004 | 0.551 |
SNN-DPC | — | — | 15 | 0.506 | -0.002 | -0.008 | 0.543 | — | — | 8 | 0.351 | 0.088 | 0.063 | 0.293 | — | — | 6 | 0.735 | 0.000 | 0.000 | 0.781 |
SC | — | 2 | — | 0.722 | 0.012 | 0.014 | 0.553 | — | 8 | — | 0.314 | 0.013 | 0.005 | 0.445 | — | 2 | — | 0.714 | -0.001 | -0.003 | 0.551 |
CTCEHC | — | 2 | — | 0.714 | 0.002 | -0.036 | 0.669 | — | 8 | — | 0.397 | 0.109 | 0.053 | 0.216 | — | 2 | — | 0.735 | -0.001 | 0.015 | 0.627 |
LDP-SC | — | 2 | 12 | 0.681 | -0.007 | -0.009 | 0.716 | — | 8 | 6 | 0.284 | 0.086 | 0.027 | 0.209 | — | 2 | 10 | 0.513 | -0.002 | -0.005 | 0.553 |
MNNGC | — | 2 | 10 | 0.725 | 0.020 | 0.042 | 0.763 | — | 8 | 6 | 0.417 | 0.083 | 0.131 | 0.334 | — | 2 | 6 | 0.737 | -0.005 | 0.025 | 0.730 |
算法 | Balance-scale | Bupa | |||||||||||||||||||
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | ||||||||
AGNES | — | 3 | — | 0.602 | 0.070 | 0.079 | 0.430 | — | 2 | — | 0.580 | 0.009 | -0.010 | 0.687 | |||||||
DPC | 2% | — | — | 0.334 | 0.074 | 0.041 | 0.474 | 2% | — | — | 0.452 | -0.002 | 0.001 | 0.559 | |||||||
K-means | — | 3 | — | 0.630 | 0.073 | 0.090 | 0.436 | — | 2 | — | 0.580 | -0.002 | -0.005 | 0.641 | |||||||
SNN-DPC | — | — | 8 | 0.397 | 0.088 | 0.003 | 0.616 | — | — | 8 | 0.420 | 0.000 | 0.000 | 0.715 | |||||||
SC | — | 3 | — | 0.606 | 0.085 | 0.103 | 0.443 | — | 2 | — | 0.579 | -0.001 | -0.002 | 0.712 | |||||||
CTCEHC | — | 3 | — | 0.461 | 0.000 | 0.000 | 0.656 | — | 2 | — | 0.580 | -0.001 | 0.006 | 0.671 | |||||||
LDP-SC | — | 3 | 6 | 0.525 | 0.071 | 0.101 | 0.448 | — | 2 | 10 | 0.559 | 0.001 | -0.007 | 0.678 | |||||||
MNNGC | — | 3 | 12 | 0.693 | 0.160 | 0.179 | 0.507 | — | 2 | 6 | 0.587 | 0.010 | 0.007 | 0.711 | |||||||
算法 | Breast-cancer | Ecoli | |||||||||||||||||||
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | ||||||||
AGNES | — | 2 | — | 0.706 | 0.007 | 0.010 | 0.762 | — | 8 | — | 0.664 | 0.555 | 0.462 | 0.681 | |||||||
DPC | 2% | — | — | 0.703 | 0.000 | 0.000 | 0.762 | 1% | — | — | 0.557 | 0.605 | 0.649 | 0.767 | |||||||
K-means | — | 2 | — | 0.703 | 0.022 | 0.001 | 0.546 | — | 8 | — | 0.756 | 0.543 | 0.373 | 0.518 | |||||||
SNN-DPC | — | — | 8 | 0.490 | -0.001 | -0.003 | 0.537 | — | — | 6 | 0.717 | 0.555 | 0.527 | 0.705 | |||||||
SC | — | 2 | — | 0.695 | 0.008 | -0.027 | 0.725 | — | 8 | — | 0.764 | 0.529 | 0.409 | 0.547 | |||||||
CTCEHC | — | 2 | — | 0.703 | -0.003 | 0.010 | 0.654 | — | 8 | — | 0.735 | 0.472 | 0.268 | 0.424 | |||||||
LDP-SC | — | 2 | 6 | 0.619 | -0.003 | -0.013 | 0.634 | — | 8 | 8 | 0.571 | 0.558 | 0.393 | 0.532 | |||||||
MNNGC | — | 2 | 8 | 0.707 | 0.000 | 0.025 | 0.751 | — | 8 | 4 | 0.767 | 0.633 | 0.684 | 0.770 |
Tab. 5 Performance comparison of clustering algorithms on UCI datasets
算法 | Glass | Ionosphere | Knowledge | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | |
AGNES | — | 6 | — | 0.383 | 0.067 | 0.02 | 0.493 | — | 2 | — | 0.644 | 0.003 | 0.005 | 0.733 | — | 4 | — | 0.616 | 0.303 | 0.267 | 0.513 |
DPC | 2% | — | — | 0.477 | 0.232 | 0.155 | 0.418 | 1% | — | — | 0.678 | 0.082 | 0.123 | 0.582 | 2% | — | — | 0.512 | 0.112 | 0.129 | 0.462 |
K-means | — | 6 | — | 0.495 | 0.253 | 0.165 | 0.373 | — | 2 | — | 0.709 | 0.130 | 0.173 | 0.603 | — | 4 | — | 0.523 | 0.223 | 0.144 | 0.375 |
SNN-DPC | — | — | 8 | 0.495 | 0.274 | 0.221 | 0.406 | — | — | 8 | 0.641 | 0.000 | 0.000 | 0.734 | — | — | 7 | 0.529 | 0.189 | 0.169 | 0.478 |
SC | — | 6 | — | 0.391 | 0.124 | 0.054 | 0.494 | — | 2 | — | 0.597 | 0.084 | -0.047 | 0.637 | — | 4 | — | 0.477 | 0.091 | 0.073 | 0.320 |
CTCEHC | — | 6 | — | 0.608 | 0.321 | 0.215 | 0.420 | — | 2 | — | 0.735 | 0.110 | 0.129 | 0.547 | — | 4 | — | 0.541 | 0.159 | 0.120 | 0.371 |
LDP-SC | — | 6 | 8 | 0.421 | 0.313 | 0.200 | 0.395 | — | 2 | 8 | 0.672 | 0.072 | 0.115 | 0.580 | — | 4 | 12 | 0.529 | 0.186 | 0.201 | 0.446 |
MNNGC | — | 6 | 6 | 0.514 | 0.370 | 0.234 | 0.554 | — | 2 | 5 | 0.725 | 0.135 | 0.188 | 0.700 | — | 4 | 4 | 0.620 | 0.366 | 0.301 | 0.513 |
算法 | Planning | Flags | Haberman-normal | ||||||||||||||||||
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | |
AGNES | — | 2 | — | 0.714 | -0.005 | -0.007 | 0.762 | — | 8 | — | 0.356 | 0.029 | 0.007 | 0.408 | — | 2 | — | 0.735 | -0.061 | 0.003 | 0.774 |
DPC | 2% | — | — | 0.462 | -0.004 | -0.005 | 0.549 | 2% | — | — | 0.340 | 0.037 | 0.052 | 0.313 | 1% | — | — | 0.533 | -0.003 | -0.002 | 0.555 |
K-means | — | 2 | — | 0.714 | -0.004 | -0.004 | 0.539 | — | 8 | — | 0.412 | 0.076 | 0.136 | 0.323 | — | 2 | — | 0.735 | -0.002 | -0.004 | 0.551 |
SNN-DPC | — | — | 15 | 0.506 | -0.002 | -0.008 | 0.543 | — | — | 8 | 0.351 | 0.088 | 0.063 | 0.293 | — | — | 6 | 0.735 | 0.000 | 0.000 | 0.781 |
SC | — | 2 | — | 0.722 | 0.012 | 0.014 | 0.553 | — | 8 | — | 0.314 | 0.013 | 0.005 | 0.445 | — | 2 | — | 0.714 | -0.001 | -0.003 | 0.551 |
CTCEHC | — | 2 | — | 0.714 | 0.002 | -0.036 | 0.669 | — | 8 | — | 0.397 | 0.109 | 0.053 | 0.216 | — | 2 | — | 0.735 | -0.001 | 0.015 | 0.627 |
LDP-SC | — | 2 | 12 | 0.681 | -0.007 | -0.009 | 0.716 | — | 8 | 6 | 0.284 | 0.086 | 0.027 | 0.209 | — | 2 | 10 | 0.513 | -0.002 | -0.005 | 0.553 |
MNNGC | — | 2 | 10 | 0.725 | 0.020 | 0.042 | 0.763 | — | 8 | 6 | 0.417 | 0.083 | 0.131 | 0.334 | — | 2 | 6 | 0.737 | -0.005 | 0.025 | 0.730 |
算法 | Balance-scale | Bupa | |||||||||||||||||||
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | ||||||||
AGNES | — | 3 | — | 0.602 | 0.070 | 0.079 | 0.430 | — | 2 | — | 0.580 | 0.009 | -0.010 | 0.687 | |||||||
DPC | 2% | — | — | 0.334 | 0.074 | 0.041 | 0.474 | 2% | — | — | 0.452 | -0.002 | 0.001 | 0.559 | |||||||
K-means | — | 3 | — | 0.630 | 0.073 | 0.090 | 0.436 | — | 2 | — | 0.580 | -0.002 | -0.005 | 0.641 | |||||||
SNN-DPC | — | — | 8 | 0.397 | 0.088 | 0.003 | 0.616 | — | — | 8 | 0.420 | 0.000 | 0.000 | 0.715 | |||||||
SC | — | 3 | — | 0.606 | 0.085 | 0.103 | 0.443 | — | 2 | — | 0.579 | -0.001 | -0.002 | 0.712 | |||||||
CTCEHC | — | 3 | — | 0.461 | 0.000 | 0.000 | 0.656 | — | 2 | — | 0.580 | -0.001 | 0.006 | 0.671 | |||||||
LDP-SC | — | 3 | 6 | 0.525 | 0.071 | 0.101 | 0.448 | — | 2 | 10 | 0.559 | 0.001 | -0.007 | 0.678 | |||||||
MNNGC | — | 3 | 12 | 0.693 | 0.160 | 0.179 | 0.507 | — | 2 | 6 | 0.587 | 0.010 | 0.007 | 0.711 | |||||||
算法 | Breast-cancer | Ecoli | |||||||||||||||||||
dc | c | k | Acc | AMI | ARI | FMI | dc | c | k | Acc | AMI | ARI | FMI | ||||||||
AGNES | — | 2 | — | 0.706 | 0.007 | 0.010 | 0.762 | — | 8 | — | 0.664 | 0.555 | 0.462 | 0.681 | |||||||
DPC | 2% | — | — | 0.703 | 0.000 | 0.000 | 0.762 | 1% | — | — | 0.557 | 0.605 | 0.649 | 0.767 | |||||||
K-means | — | 2 | — | 0.703 | 0.022 | 0.001 | 0.546 | — | 8 | — | 0.756 | 0.543 | 0.373 | 0.518 | |||||||
SNN-DPC | — | — | 8 | 0.490 | -0.001 | -0.003 | 0.537 | — | — | 6 | 0.717 | 0.555 | 0.527 | 0.705 | |||||||
SC | — | 2 | — | 0.695 | 0.008 | -0.027 | 0.725 | — | 8 | — | 0.764 | 0.529 | 0.409 | 0.547 | |||||||
CTCEHC | — | 2 | — | 0.703 | -0.003 | 0.010 | 0.654 | — | 8 | — | 0.735 | 0.472 | 0.268 | 0.424 | |||||||
LDP-SC | — | 2 | 6 | 0.619 | -0.003 | -0.013 | 0.634 | — | 8 | 8 | 0.571 | 0.558 | 0.393 | 0.532 | |||||||
MNNGC | — | 2 | 8 | 0.707 | 0.000 | 0.025 | 0.751 | — | 8 | 4 | 0.767 | 0.633 | 0.684 | 0.770 |
1 | ROSENBERGER C, CHEHDI K. Unsupervised clustering method with optimal estimation of the number of clusters: application to image segmentation[C] // Proceedings of the 15th International Conference on Pattern Recognition. Piscataway: IEEE, 2000: 656-659. 10.1109/ICPR.2000.905473 |
2 | LLOBELL F, VIGNEAU E, QANNARI E M. Clustering datasets by means of CLUSTATIS with identification of atypical datasets. Application to sensometrics[J]. Food Quality and Preference, 2019, 75: 97-104. 10.1016/j.foodqual.2019.02.017 |
3 | OKTAR Y, TURKAN M. A review of sparsity-based clustering methods[J]. Signal Processing, 2018, 148: 20-30. 10.1016/j.sigpro.2018.02.010 |
4 | AGGARWALC C, REDDY C K. Data Clustering: Algorithms and Applications[M]. Boca Raton: CRC Press, 2013:596-627. |
5 | POTHULA K R, SMYRNOVA D, SCHRÖDER G F. Clustering cryo-EM images of helical protein polymers for helical reconstructions[J]. Ultramicroscopy, 2018, 203: 132-138. 10.1016/j.ultramic.2018.12.009 |
6 | HAN J, KAMBER M. 数据挖掘:概念与技术[M].范明,孟小峰,译.2版.北京: 机械工业出版社,2007: 263-266. |
HAN J, KAMBER M. Data Mining: Concepts and Techniques [M]. FAN M, MENG X F, translated. 2nd ed. Beijing: China Machine Press, 2007: 263-266. | |
7 | JAIN A K. Data clustering: 50 years beyond K-means[J]. Pattern Recognition Letters,2010, 31(8):651-666. 10.1016/j.patrec.2009.09.011 |
8 | SAXENA A, PRASAD M, GUPTA A, et al. A review of clustering techniques and developments[J]. Neurocomputing, 2017, 267: 664-681. 10.1016/j.neucom.2017.06.053 |
9 | 王巧玲,乔非,蒋友好.基于聚合距离参数的改进K-means算法[J].计算机应用,2019,39(9):2586-2590. 10.11772/j.issn.1001-9081.2019030485 |
WANG Q L, QIAO F, JIANG Y H. Improved K-means algorithm with aggregation distance coefficient[J]. Journal of Computer Applications, 2019, 39(9): 2586-2590. 10.11772/j.issn.1001-9081.2019030485 | |
10 | 张朋,李小林,王李妍.基于DBSCAN的动态邻域密度聚类算法[J].计算机科学,2023,50(6A):220400127. 10.11896/jsjkx.220400127 |
ZHANG P, LI X L, WANG L Y. Dynamic neighborhood density clustering algorithm based on DBSCAN [J]. Computer Science, 2023,50(6A):220400127. 10.11896/jsjkx.220400127 | |
11 | 刘静姝,王莉,刘惊雷.无需特征分解的快速谱聚类算法[J].计算机应用,2020,40(12):3413-3422. 10.11772/j.issn.1001-9081.2020061040 |
LIU J S, WANG L, LIU J L. Fast spectral clustering algorithm without eigen-decomposition[J]. Journal of Computer Applications, 2020,40(12):3413-3422. 10.11772/j.issn.1001-9081.2020061040 | |
12 | KANUNGO T, MOUNT D M, NETANYAHU N S, et al. An efficient k-means clustering algorithm: analysis and implementation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(7):881-892. 10.1109/tpami.2002.1017616 |
13 | 杨静,高嘉伟,梁吉业,等. 基于数据场的改进DBSCAN聚类算法[J]. 计算机科学与探索, 2012, 6 (10): 903-911. 10.3778/j.issn.1673-9418.2012.10.005 |
YANG J, GAO J W, LIANG J Y, et al. An improved DBSCAN clustering algorithm based on data field[J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(10): 903-911. 10.3778/j.issn.1673-9418.2012.10.005 | |
14 | RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496. 10.1126/science.1242072 |
15 | LIU R, WANG H, YU X. Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Information Sciences, 2018, 450: 200-226. 10.1016/j.ins.2018.03.031 |
16 | LIN C-R, CHEN M-S. Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging[J].IEEE Transactions on Knowledge and Data Engineering, 2005,17(2): 145-159. 10.1109/tkde.2005.21 |
17 | NG A Y, JORDAN M I, WEISS Y. On spectral clustering: analysis and an algorithm[C]// Proceedings of the 14th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2002:849-856. |
18 | KARYPIS G, HAN E H, KUMAR V. Chameleon: hierarchical clustering using dynamic modeling[J]. Computer,1999, 32(8):68-75. 10.1109/2.781637 |
19 | ZHONG C, MIAO D, FRÄNTI P. Minimum spanning tree based split-and-merge: a hierarchical clustering method[J]. Information Sciences, 2011, 181(16): 3397-3410. 10.1016/j.ins.2011.04.013 |
20 | MA Y, LIN H, WANG Y, et al. A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint[J]. Information Sciences, 2021, 557: 194-219. 10.1016/j.ins.2020.12.016 |
21 | LONG Z, GAO Y, MENG H, et al. Clustering based on local density peaks and graph cut[J]. Information Sciences, 2022, 600: 263-286. 10.1016/j.ins.2022.03.091 |
22 | 龙建武,王强.反向近邻构造连通图的聚类算法[J/OL].计算机科学与探索, 2022 [2023-06-01]. . |
LONG J W, WANG Q. Clustering algorithm of connected graphs constructed by reverse nearest neighbor[J/OL]. Journal of Frontiers of Computer Science and Technology, 2022 [2023-06-01]. . | |
23 | HUANG D, WANG C-D, PENG H, et al. Enhanced ensemble clustering via fast propagation of cluster-wise similarities[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(1):508-520. 10.1109/tsmc.2018.2876202 |
24 | 白璐,赵鑫,孔钰婷,等.谱聚类算法研究综述[J].计算机工程与应用,2021,57(14): 15-26. 10.3778/j.issn.1002-8331.2103-0547 |
BAI L, ZHAO X, KONG Y T, et al. Review of spectral clustering algorithms [J]. Computer Engineering and Applications, 2021, 57(14): 15-26. 10.3778/j.issn.1002-8331.2103-0547 | |
25 | 王淞,黄浩,余果,等.一种基于 k 近邻图的稀有类检测算法[J].软件学报,2016,27(9):2320-2331. 10.13328/j.cnki.jos.004872 |
WANG S, HUANG H, YU G, et al. Rare category detection algorithm based on k‑nearest neighbor graphs [J]. Journal of Software,2016,27(9):2320-2331. 10.13328/j.cnki.jos.004872 | |
26 | 冯骥,冉瑞生,魏延.基于自然邻居邻域图的无参数离群检测算法[J].智能系统学报, 2019, 14(5): 998-1006. 10.11992/tis.201809032 |
FENG J, RAN R S, WEI Y. A parameter-free outlier detection algorithm based on natural neighborhood graph [J]. CAAI Transactions on Intelligent Systems, 2019, 14(5): 998-1006. 10.11992/tis.201809032 | |
27 | 周志华.机器学习[M].北京:清华大学出版社,2016:319-327. |
ZHOU Z H. Machine Learning [M]. Beijing: Tsinghua University Press, 2016: 319-327. | |
28 | CHANG H, D-Y YEUNG. Robust path-based spectral clustering[J]. Pattern Recognition, 2008, 41 (1): 191-203. 10.1016/j.patcog.2007.04.010 |
29 | VEENMAN C J, REINDERS M J T, BACKER E. A maximum variance cluster algorithm [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(9): 1273-1280. 10.1109/tpami.2002.1033218 |
30 | ZHANG R, DU T, QU S N, et al. Adaptive density-based clustering algorithm with shared KNN conflict game [J]. Information Sciences, 2021, 565: 344-369. 10.1016/j.ins.2021.02.017 |
[1] | Zhongping ZHANG, Xin GUO, Yuting ZHANG, Ruibo ZHANG. Outlier detection algorithm based on hologram stationary distribution factor [J]. Journal of Computer Applications, 2023, 43(6): 1705-1712. |
[2] | Jingfeng GUO, Hui DONG, Tingwei ZHANG, Xiao CHEN. Representation learning for topic-attention network [J]. Journal of Computer Applications, 2020, 40(2): 441-447. |
[3] | SONG Yan, YIN Jun. Multi-view spectral clustering algorithm based on shared nearest neighbor [J]. Journal of Computer Applications, 2020, 40(11): 3211-3216. |
[4] | FAN Zhongxin, WANG Xing, MIAO Chunsheng. Improved BIRCH clustering algorithm based on connectivity distance and intensity [J]. Journal of Computer Applications, 2019, 39(4): 1027-1031. |
[5] | WANG Jun, ZHOU Kai, CHENG Yong. Mixed density peaks clustering algorithm [J]. Journal of Computer Applications, 2019, 39(2): 403-408. |
[6] | YAN Hongwen, SHENG Chenggong. Short-term bus load forecasting based on hierarchical clustering method and extreme learning machine [J]. Journal of Computer Applications, 2018, 38(8): 2437-2441. |
[7] | WEI Baoguo, GE Ping, WU Hong, WANG Gaofeng, HAN Wenliang. Object tracking algorithm based on static-adaptive appearance model correction [J]. Journal of Computer Applications, 2018, 38(4): 1170-1175. |
[8] | TANG Jiaqi, WU Jingli. Protein function prediction method based on PPI network and machine learning [J]. Journal of Computer Applications, 2018, 38(3): 722-727. |
[9] | HU Jianghua WANG Wenzhong LUO Bin TANG Jin. Pedestrian segmentation based on Graph Cut with shape prior [J]. Journal of Computer Applications, 2014, 34(3): 837-840. |
[10] | SHEN Hua FENG Jing YIN Min MA Weijun JIANG Lei. Optimization methods for application layer multicast [J]. Journal of Computer Applications, 2013, 33(12): 3389-3393. |
[11] | ZHAO Yanli WANG Xing. Steganalysis of JPEG images based on bilateral transition probability matrix [J]. Journal of Computer Applications, 2013, 33(04): 1074-1076. |
[12] | BAI Xue REN Xiaoling LIU Xiyu. Hierarchical clustering algorithm based on sticker and 2-armed DNA model [J]. Journal of Computer Applications, 2013, 33(02): 308-315. |
[13] | Jun-fang JIA. HC_AL: New active learning method based on hierarchical clustering [J]. Journal of Computer Applications, 2011, 31(08): 2134-2137. |
[14] | TANG ZhouWen YE DongYi. Dissimilar attribute reductions based on hierarchical clustering method [J]. Journal of Computer Applications, 2009, 29(2): 419-420. |
[15] | . Campus-oriented stepwise-optimal hierarchical clustering algorithm of IP address [J]. Journal of Computer Applications, 2007, 27(8): 1862-1864. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||