基于最大均值差异的子空间高斯混合模型聚类集成算法

doi:10.11772/j.issn.1001-9081.2024070943

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1712-1723.DOI: 10.11772/j.issn.1001-9081.2024070943

• 第十二届CCF大数据学术会议 • 上一篇

基于最大均值差异的子空间高斯混合模型聚类集成算法

何玉林¹^,², 李旭¹(), 贺颖婷², 崔来中¹^,², 黄哲学¹^,²

^1.人工智能与数字经济广东省实验室（深圳），广东深圳 518107
^2.深圳大学计算机与软件学院，广东深圳 518060

收稿日期:2024-07-08 修回日期:2024-08-02 接受日期:2024-08-22 发布日期:2024-09-02 出版日期:2025-06-10
通讯作者: 李旭
作者简介:何玉林（1982—），男，河北衡水人，研究员，博士，CCF会员，主要研究方向：大数据近似计算、多样本统计、数据挖掘、机器学习
李旭（1996—），男，广东汕头人，工程师，硕士，CCF会员，主要研究方向：大数据分布式计算、数据挖掘、机器学习 lixu@gml.ac.cn
贺颖婷（1997—）女，广东深圳人，工程师，硕士，主要研究方向：数据挖掘、机器学习
崔来中（1984—），男，吉林白山人，教授，博士，CCF会员，主要研究方向：互联网体系结构、边缘计算、AI驱动的网络优化
黄哲学（1959—），男，黑龙江哈尔滨人，教授，博士，CCF会员，主要研究方向：新型算力网络的智能计算、大数据近似计算、数据挖掘、机器学习。
基金资助:
广东省自然科学基金面上项目(2023A1515011667);深圳市科技重大专项(KJZD20230923114809020);广东省基础与应用基础研究基金粤深联合基金重点项目(2023B1515120020);深圳市基础研究重点项目(JCYJ20220818100205012)

Subspace Gaussian mixture model clustering ensemble algorithm based on maximum mean discrepancy

Yulin HE¹^,², Xu LI¹(), Yingting HE², Laizhong CUI¹^,², Zhexue HUANG¹^,²

^1.Guangdong Laboratory of Artificial Intelligence and Digital Economy （SZ），Shenzhen Guangdong 518107，China
^2.College of Computer Science and Software Engineering，Shenzhen University，Shenzhen Guangdong 518060，China

Received:2024-07-08 Revised:2024-08-02 Accepted:2024-08-22 Online:2024-09-02 Published:2025-06-10
Contact: Xu LI
About author:HE Yulin， born in 1982， Ph. D.， research fellow. His research interests include approximate computation of big data， multi-sample statistics， data mining， machine learning.
LI Xu， born in 1996， M. S.， engineer. His research interests include distributed computation of big data， data mining， machine learning.
HE Yingting， born in 1997， M. S.， engineer. Her research interests include data mining， machine learning.
CUI Laizhong， born in 1984， Ph. D.， professor. His research interests include Internet architecture， edge computing， AI-driven network optimization.
HUANG Zhexue， born in 1959， Ph. D.， professor. His research interests include intelligent computation of new computational power network， approximation computation of big data， data mining， machine learning.
Supported by:
Natural Science Foundation of Guangdong Province(2023A1515011667);Science and Technology Major Project of Shenzhen(KJZD20230923114809020);Key Project of Guangdong Shenzhen Joint Fund of Guangdong Basic and Applied Basic Research Foundation(2023B1515120020);Key Basic Research Foundation of Shenzhen(JCYJ20220818100205012)

摘要/Abstract

摘要：

针对高斯混合模型（GMM）聚类算法在处理大规模高维数据聚类时出现的性能受限和参数敏感的问题，提出一种基于最大均值差异（MMD）的子空间GMM聚类集成（SGMM-CE）算法。首先，对原始大规模高维数据集进行随机样本划分（RSP）以得到多个数据子集，从样本量的角度缩小聚类问题的规模；其次，根据特征对最优GMM构件数的影响，在每一个数据子集对应的高维特征空间中进行子空间学习，得到每个高维特征空间对应的多个低维特征子空间，并在各个子空间上进行GMM聚类，从而得到一系列异构的GMM；再次，利用所提出的平均共享隶属概率（ASAP），重标记与融合来自同一个数据子集的不同特征子空间上的聚类结果；最后，利用扩展的子空间MMD（SubMMD）作为不同数据子集的聚类结果中2个簇之间的分布一致性的度量准则，据此重标记并融合这些数据子集的聚类结果，进而得到原始数据集的最终聚类集成结果。通过详尽的实验验证SGMM-CE算法的有效性，实验结果显示，相较于对比算法中最好的元簇聚类算法（MCLA），SGMM-CE算法在选用的数据集上的平均标准化互信息（NMI）、聚类精度（CA）和调整兰德系数（ARI）值分别提升了19%，20%和52%。此外，可行性和合理性的实验结果证实了SGMM-CE算法的参数收敛性与时间高效性，表明该算法具备高效处理大规模高维数据聚类问题的能力。

关键词: 无监督学习, 集成学习, 子空间学习, 最大均值差异, 高斯混合模型

Abstract:

To address the problems of limited capability and parameter sensitivity of Gaussian Mixture Model （GMM） clustering algorithms in processing large-scale high-dimensional data clustering， a Subspace GMM Clustering Ensemble （SGMM-CE） algorithm based on Maximum Mean Discrepancy （MMD） was proposed. Firstly， Random Sample Partition （RSP） was performed to the original large-scale high-dimensional dataset to obtain multiple subsets of data， thereby reducing the size of clustering problem from the perspective of sample size. Secondly， subspace learning was performed in the high-dimensional feature space corresponding to each subset of data by considering the influence of features on optimal number of GMM components， so that multiple low-dimensional feature subspaces corresponding to each high-dimensional feature space were obtained， and then GMM clustering was conducted on each subspace to obtain a series of heterogeneous GMMs. Thirdly， GMM clustering results of different subspaces from the same subset of data were relabeled and merged on the basis of the proposed Average Shared Affiliation Probability （ASAP）. Finally， the expanded Subspace MMD （SubMMD） was used as a criterion to measure distributional consistency between two clusters in the clustering results of different subsets of data， so as to relabel and merge clustering results of these subsets of data based on the above， thereby obtaining the final clustering ensemble result of the original dataset. Exhaustive experiments were conducted to validate the effectiveness of SGMM-CE algorithm. Experimental results show that compared with the best-performing comparison algorithm — Meta-CLustering Algorithm （MCLA）， SGMM-CE algorithm increases 19%， 20%， and 52% for Normalized Mutual Information （NMI）， Clustering Accuracy （CA） and Adjusted Rand Index （ARI） values， respectively， on the given clustering datasets. Besides， the feasibility and rationality experimental results reflect that SGMM-CE algorithm has parameter convergence and time efficiency， demonstrating that this algorithm can deal with large-scale high-dimensional data clustering problems effectively.

Key words: unsupervised learning, ensemble learning, subspace learning, Maximum Mean Discrepancy (MMD), Gaussian Mixture Model (GMM)

中图分类号:

TP181

何玉林, 李旭, 贺颖婷, 崔来中, 黄哲学. 基于最大均值差异的子空间高斯混合模型聚类集成算法[J]. 计算机应用, 2025, 45(6): 1712-1723.

Yulin HE, Xu LI, Yingting HE, Laizhong CUI, Zhexue HUANG. Subspace Gaussian mixture model clustering ensemble algorithm based on maximum mean discrepancy[J]. Journal of Computer Applications, 2025, 45(6): 1712-1723.

图/表 9

参考文献 52

1	SAXENA A， PRASAD M， GUPTA A， et al. A review of clustering techniques and developments［J］. Neurocomputing， 2017， 267： 664-681.
2	GOLALIPOUR K， AKBARI E， HAMIDI S S， et al. From clustering to clustering ensemble selection： a review［J］. Engineering Applications of Artificial Intelligence， 2021， 104： No.104388.
3	AHMED M， SERAJ R， ISLAM S M S. The k-means algorithm： a comprehensive survey and performance evaluation［J］. Electronics， 2020， 9（8）： No.1295.
4	McLACHLAN G L， BASFORD K E. Mixture models： inference and applications to clustering ［M］. New York： Marcel Dekker， 1988： 337-338.
5	BOUKERCHE A， ZHENG L， ALFANDI O. Outlier detection： methods， models， and classification ［J］. ACM Computing Survey， 2021， 53（3）： No.55.
6	CHADDAD A. Automated feature extraction in brain tumor by magnetic resonance imaging using Gaussian mixture models［J］. International Journal of Biomedical Imaging， 2015， 2015： No.868031.
7	VIROLI C， McLACHLAN G J. Deep Gaussian mixture models ［J］. Statistics and Computing， 2019， 29： 43-51.
8	章永来，周耀鉴. 聚类算法综述［J］. 计算机应用， 2019， 39（7）： 1869-1882.
	ZHANG Y L， ZHOU Y J. Review of clustering algorithms ［J］. Journal of Computer Applications， 2019， 39（7）： 1869-1882.
9	ZHU X， GOLDBERG A B. Introduction to semi-supervised learning， SLAIML ［M］. Cham： Springer， 2009.
10	王爱平，张功营，刘方. EM算法研究与应用［J］.计算机技术与发展， 2009， 19（9）： 108-110.
	WANG A P， ZHANG G Y， LIU F. Research and application of EM algorithm［J］. Computer Technology and Development， 2009， 19（9）： 108-110.
11	BOUVEYRON C， BRUNET-SAUMARD C. Model-based clustering of high-dimensional data： a review ［J］. Computational Statistics and Data Analysis， 2014， 71： 52-78.
12	WANG D， GUO X， LI S， et al. Robust high dimensional expectation maximization algorithm via trimmed hard thresholding［J］. Machine Learning， 2020， 109（12）： 2283-2311.
13	PRZYBOROWSKI M， ŚLĘZAK D. Approximation of the expectation-maximization algorithm for Gaussian mixture models on big data［C］// Proceedings of the 2022 IEEE International Conference on Big Data. Piscataway： IEEE， 2022： 6256-6260.
14	GRETTON A， BORGWARDT K M， RASCH M J， et al. A kernel two-sample test［J］. Journal of Machine Learning Research， 2012， 13： 723-773.
15	HE Y， HE Y， ZHAN Z， et al. A novel subspace-based GMM clustering ensemble algorithm for high-dimensional data［J］. Chinese Journal of Electronics， 2025， 34（2）： 612-629.
16	SALLOUM S， HUANG J Z， HE Y. Random sample partition： a distributed data model for big data analysis ［J］. IEEE Transactions on Industrial Informatics， 2019， 15（11）： 5846-5854.
17	HUANG D， WANG C D， LAI J H. Locally weighted ensemble clustering ［J］. IEEE Transactions on Cybernetics， 2018， 48（5）： 1460-1473.
18	YANG W， ZHANG Y， WANG H， et al. Hybrid genetic model for clustering ensemble ［J］. Knowledge-Based Systems， 2021， 231： No.107457.
19	BAGHERINIA A， MINAEI-BIDGOLI B， HOSSEINZADEH M， et al. Reliability-based fuzzy clustering ensemble ［J］. Fuzzy Sets and Systems， 2021， 413： 1-28.
20	DU X， HE Y， HUANG J Z. Random sample partition-based clustering ensemble algorithm for big data ［C］// Proceedings of the 2021 IEEE International Conference on Big Data. Piscataway： IEEE， 2021： 5885-5887.
21	JI S， XING R. Clustering ensemble of massive data based on trusted region［C］// Proceedings of the 3rd International Conference on Machine Learning， Big Data and Business Intelligence. Piscataway： IEEE， 2021： 337-340.
22	HE S， LI H， GUO Q， et al. Feature weighted dual random sampling cluster ensemble［C］// Proceedings of the 5th International Conference on Machine Learning and Soft Computing. New York： ACM， 2021： 54-59.
23	ANDERLUCCI L， FORTUNATO F， MONTANARI A. High-dimensional clustering via random projections ［J］. Journal of Classification， 2022， 39（1）： 191-216.
24	FRED A L N， JAIN A K. Combining multiple clusterings using evidence accumulation ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2005， 27（6）： 835-850.
25	LIU H， WU J， LIU T， et al. Spectral ensemble clustering via weighted K-means： theoretical and practical evidence［J］. IEEE Transactions on Knowledge and Data Engineering， 2017， 29（5）： 1129-1143.
26	JI X， LIU S， YANG L， et al. Clustering ensemble based on approximate accuracy of the equivalence granularity ［J］. Applied Soft Computing， 2022， 129： No.109492.
27	JIA Y， TAO S， WANG R， et al. Ensemble clustering via co-association matrix self-enhancement［J］. IEEE Transactions on Neural Networks and Learning Systems， 2024， 35（8）： 11168-11179.
28	STREHL A， GHOSH J. Cluster ensembles — a knowledge reuse framework for combining multiple partitions［J］. Journal of Machine Learning Research， 2002， 3： 583-617.
29	HUANG D， WANG C D， LAI J H. LWMC： a locally weighted meta-clustering algorithm for ensemble clustering ［C］// Proceedings of the 2017 International Conference on Neural Information Processing， LNCS 10638. Cham： Springer， 2017： 167-176.
30	HAMIDI S S， AKBARI E， MOTAMENI H. Consensus clustering algorithm based on the automatic partitioning similarity graph ［J］. Data and Knowledge Engineering， 2019， 124： No.101754.
31	WANG L， LUO J， WANG H， et al. Markov clustering ensemble［J］. Knowledge-Based Systems， 2022， 251： No.109196.
32	ZHOU P， DU L， LI X. Self-paced consensus clustering with bipartite graph［C］// Proceedings of the 29th International Joint Conferences on Artificial Intelligence. California： ijcai.org， 2020： 2133-2139.
33	DUDOIT S， FRIDLYAND J. Bagging to improve the accuracy of a clustering procedure［J］. Bioinformatics， 2003， 19（9）： 1090-1099.
34	KUHN H W. The Hungarian method for the assignment problem［J］. Naval Research Logistics Quarterly， 1955， 2（1/2）： 83-97.
35	ZHOU Z H， TANG W. Clusterer ensemble ［J］. Knowledge-Based Systems， 2006， 19（1）： 77-83.
36	KHEDAIRIA S， KHADIR M T. A multiple clustering combination approach based on iterative voting process［J］. Journal of King Saud University — Computer and Information Sciences， 2022， 34（1）： 1370-1380.
37	BURTON R J， CUFF S M， MORGAN M P， et al. GeoWaVe： geometric median clustering with weighted voting for ensemble clustering of cytometry data ［J］. Bioinformatics， 2023， 39（1）： No.btac751.
38	TOPCHY A， JAIN A K， PUNCH W. A mixture model for clustering ensembles［C］// Proceedings of the 2004 SIAM International Conference on Data Mining. Philadelphia， PA： SIAM， 2004： 379-390.
39	TUMER K， AGOGINO A K. Ensemble clustering with voting active clusters ［J］. Pattern Recognition Letters， 2008， 29（14）： 1947-1953.
40	ZHONG Y， WANG H， YANG W， et al. Multi-objective genetic model for co-clustering ensemble［J］. Applied Soft Computing， 2023， 135： No.110058.
41	RUAN L， YUAN M， ZOU H. Regularized parameter estimation in high-dimensional Gaussian mixture models［J］. Neural Computation， 2011， 23（6）： 1605-1622.
42	ZHAO Y， SHRIVASTAVA A K， TSUI K L. Regularized Gaussian mixture model for high-dimensional clustering［J］. IEEE Transactions on Cybernetics， 2019， 49（10）： 3677-3688.
43	BIERNACKI C， CELEUX G， GOVAERT G. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models ［J］. Computational Statistics and Data Analysis， 2003， 41（3/4）： 561-575.
44	MAITRA R. Initializing partition-optimization algorithms［J］. IEEE/ACM Transactions on Computational Biology and Bioinformatics， 2009， 6（1）： 144-157.
45	MICHAEL S， MELNYKOV V. An effective strategy for initializing the EM algorithm in finite mixture models ［J］. Advances in Data Analysis and Classification， 2016， 10（4）： 563-583.
46	MELNYKOV V， MELNYKOV I. Initializing the EM algorithm in Gaussian mixture models with an unknown number of components［J］. Computational Statistics and Data Analysis， 2012， 56（6）： 1381-1395.
47	PANIĆ B， KLEMENC J， NAGODE M. Improved initialization of the EM algorithm for mixture model parameter estimation［J］. Mathematics， 2020， 8（3）： No.373.
48	GIONIS A， MANNILA H， TSAPARAS P. Clustering aggregation［J］. ACM Transactions on Knowledge Discovery from Data， 2007， 1（1）： No.4-es.
49	AGRAWAL R， GEHRKE J， GUNOPULOS D， et al. Automatic subspace clustering of high dimensional data for data mining applications［C］// Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York： ACM， 1998： 94-105.
50	LI T， DING C， JORDAN M I. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization［C］// Proceedings of the 7th IEEE International Conference on Data Mining. Piscataway： IEEE， 2007： 577-582.
51	YUAN C， YANG H. Research on K-value selection method of K-means clustering algorithm［J］. J — Multidisciplinary Scientific Journal， 2019， 2（2）： 226-235.
52	IDRUS A， TARIHORAN N， SUPRIATNA U， et al. Distance analysis measuring for clustering using K-Means and Davies Bouldin Index algorithm ［J］. TEM Journal， 2022， 11（4）： 1871-1876.

数据集	样本量	特征数	类别数
Coil20	1 440	1 024	20
MF	2 000	649	10
Lung	203	3 312	5
TOX	171	5 748	4
Semeion	1 593	256	10
Covertype	581 012	54	7
MNIST	70 000	784	10
syn_1	10 000	100	4
syn_2	100 000	100	5

数据集	样本量	特征数	类别数
Coil20	1 440	1 024	20
MF	2 000	649	10
Lung	203	3 312	5
TOX	171	5 748	4
Semeion	1 593	256	10
Covertype	581 012	54	7
MNIST	70 000	784	10
syn_1	10 000	100	4
syn_2	100 000	100	5

评估指标	数据集	样本量	GMM	EAC	MCLA	HGPA	CSPA	LWEA	LWGP	CEAAE	EC-CMS	SGMM-CE
NMI	Coil20	1 440	0.719 8	0.259 7	0.760 2	0.669 2	0.737 1	0.739 7	0.763 2	0.573 1	0.510 6	0.781 3
	MF	2 000	0.656 8	0.578 5	0.653 8	0.639 4	0.621 5	0.680 0	0.709 1	0.680 5	0.623 2	0.715 0
	Lung	203	0.522 9	0.474 6	0.543 6	0.533 7	0.577 2	0.684 1	0.670 9	0.755 0	0.553 5	0.672 1
	TOX	171	0.281 5	0.234 1	0.234 5	0.208 2	0.244 1	0.295 9	0.284 7	0.303 9	0.264 6	0.306 6
	Semeion	1 593	0.628 6	0.517 5	0.589 2	0.601 3	0.533 3	0.643 4	0.642 8	0.603 7	0.581 2	0.642 0
	Covertype	35 000	0.174 8	0.144 7	0.110 3	0.092 4	0.073 4	0.168 2	0.172 9	0.148 2	0.139 8	0.186 2
		45 000	0.171 5	0.144 1	0.102 5	0.090 8	N/A	N/A	N/A	0.150 1	0.142 6	0.184 7
		581 012	0.168 9	N/A	0.099 8	0.089 2	N/A	N/A	N/A	N/A	N/A	0.180 2
	MNIST	35 000	0.687 9	0.541 1	0.574 9	0.663 9	0.585 1	0.691 4	0.685 7	0.673 2	0.625 7	0.696 0
		40 000	0.690 3	0.537 4	0.583 5	0.659 1	0.587 2	N/A	N/A	0.660 8	0.630 1	0.714 3
		70 000	0.698 4	N/A	0.606 8	0.668 5	N/A	N/A	N/A	N/A	N/A	0.720 7
	syn_1	5 000	0.802 7	0.584 0	0.852 7	0.603 4	0.637 0	0.696 0	0.701 8	0.693 1	0.629 5	0.894 6
	syn_1	10 000	0.814 3	0.647 0	0.683 5	0.542 0	0.573 9	0.703 0	0.809 3	0.695 7	0.634 8	0.849 3
	syn_2	50 000	0.819 4	0.648 5	0.689 0	0.537 6	N/A	N/A	N/A	0.719 1	0.784 0	0.851 0
	syn_2	100 000	0.820 6	0.619 4	0.691 4	0.550 3	N/A	N/A	N/A	N/A	N/A	0.837 9
CA	Coil20	1 440	0.539 5	0.100 0	0.655 4	0.549 2	0.637 9	0.594 1	0.648 8	0.503 5	0.518 4	0.656 1
	MF	2 000	0.595 2	0.538 4	0.603 1	0.590 0	0.579 6	0.623 9	0.630 2	0.603 8	0.514 3	0.738 2
	Lung	203	0.596 0	0.523 2	0.678 5	0.585 0	0.612 3	0.703 2	0.699 0	0.924 6	0.663 6	0.767 3
	TOX	171	0.498 4	0.480 7	0.479 2	0.455 6	0.487 1	0.523 8	0.510 0	0.609 3	0.497 3	0.619 0
	Semeion	1 593	0.666 7	0.579 1	0.640 0	0.649 2	0.584 3	0.683 5	0.686 9	0.630 1	0.587 0	0.692 1
	Covertype	35 000	0.520 1	0.427 1	0.393 7	0.371 5	0.328 1	0.496 0	0.512 0	0.446 1	0.421 6	0.535 5
		45 000	0.519 6	0.419 6	0.391 6	0.370 4	N/A	N/A	N/A	0.446 7	0.435 5	0.528 5
		581 012	0.516 0	N/A	0.390 0	0.369 4	N/A	N/A	N/A	N/A	N/A	0.524 6
	MNIST	35 000	0.865 8	0.769 3	0.752 8	0.538 5	0.760 0	0.876 1	0.874 0	0.873 8	0.805 4	0.876 4
		40 000	0.869 2	0.754 5	0.762 7	0.524 5	0.758 3	N/A	N/A	0.872 5	0.809 8	0.869 0
		70 000	0.873 2	N/A	0.769 8	0.537 4	N/A	N/A	N/A	N/A	N/A	0.874 7
	syn_1	5 000	0.683 7	0.528 5	0.728 6	0.803 6	0.523 5	0.563 7	0.570 5	0.559 4	0.537 2	0.906 4
	syn_1	10 000	0.696 3	0.611 7	0.657 4	0.469 3	0.579 4	0.710 3	0.701 7	0.560 0	0.54	0.861 9
	syn_2	50 000	0.701 4	0.610 0	0.658 3	0.468 0	N/A	N/A	N/A	0.679 3	0.693 2	0.819 4
	syn_2	100 000	0.703 5	0.614 4	0.655 0	0.471 0	N/A	N/A	N/A	N/A	N/A	0.803 0
ARI	Coil20	1 440	0.271 7	0.035 1	0.432 0	0.168 0	0.190 0	0.459 2	0.371 5	0.123 9	0.148 3	0.471 0
	MF	2 000	0.418 9	0.311 9	0.433 5	0.323 0	0.417 6	0.327 0	0.455 2	0.346 0	0.367 0	0.445 9
	Lung	203	0.150 5	0.101 9	0.318 2	0.153 0	0.193 1	0.404 3	0.403 6	0.740 3	0.314 8	0.566 7
	TOX	171	0.206 2	0.164 6	0.162 4	0.158 3	0.168 5	0.209 8	0.217 7	0.220 6	0.181 0	0.223 2
	Semeion	1 593	0.466 7	0.345 5	0.461 8	0.348 7	0.413 5	0.542 0	0.537 1	0.404 5	0.353 1	0.550 8
	Covertype	35 000	0.614 7	0.564 2	0.448 5	0.441 6	0.414 7	0.599 8	0.610 1	0.570 3	0.567 7	0.632 5
		45 000	0.613 8	0.561 3	0.443 7	0.428 5	N/A	N/A	N/A	0.570 6	0.569 8	0.629 1
		581 012	0.609 4	N/A	0.439 9	0.421 6	N/A	N/A	N/A	N/A	N/A	0.625 7
	MNIST	35 000	0.747 5	0.118 7	0.124 9	0.158 4	0.119 4	0.754 0	0.749 5	0.745 0	0.671 9	0.763 6
		40 000	0.749 2	0.112 3	0.126 0	0.160 0	0.120 6	N/A	N/A	0.745 2	0.670 5	0.755 7
		70 000	0.754 3	N/A	0.125 3	0.183 7	N/A	N/A	N/A	N/A	N/A	0.760 0
	syn_1	5 000	0.658 2	0.350 8	0.807 9	0.693 0	0.573 8	0.747 3	0.750 4	0.736 0	0.586 5	0.827 5
	syn_1	10 000	0.653 7	0.642 9	0.677 0	0.338 6	0.424 7	0.641 3	0.649 2	0.735 8	0.591 3	0.809 4
	syn_2	50 000	0.655 0	0.651 0	0.673 2	0.340 0	N/A	N/A	N/A	0.663 7	0.653 9	0.809 3
	syn_2	100 000	0.652 7	0.669 4	0.674 7	0.338 6	N/A	N/A	N/A	N/A	N/A	0.813 4

评估指标	数据集	样本量	GMM	EAC	MCLA	HGPA	CSPA	LWEA	LWGP	CEAAE	EC-CMS	SGMM-CE
NMI	Coil20	1 440	0.719 8	0.259 7	0.760 2	0.669 2	0.737 1	0.739 7	0.763 2	0.573 1	0.510 6	0.781 3
	MF	2 000	0.656 8	0.578 5	0.653 8	0.639 4	0.621 5	0.680 0	0.709 1	0.680 5	0.623 2	0.715 0
	Lung	203	0.522 9	0.474 6	0.543 6	0.533 7	0.577 2	0.684 1	0.670 9	0.755 0	0.553 5	0.672 1
	TOX	171	0.281 5	0.234 1	0.234 5	0.208 2	0.244 1	0.295 9	0.284 7	0.303 9	0.264 6	0.306 6
	Semeion	1 593	0.628 6	0.517 5	0.589 2	0.601 3	0.533 3	0.643 4	0.642 8	0.603 7	0.581 2	0.642 0
	Covertype	35 000	0.174 8	0.144 7	0.110 3	0.092 4	0.073 4	0.168 2	0.172 9	0.148 2	0.139 8	0.186 2
		45 000	0.171 5	0.144 1	0.102 5	0.090 8	N/A	N/A	N/A	0.150 1	0.142 6	0.184 7
		581 012	0.168 9	N/A	0.099 8	0.089 2	N/A	N/A	N/A	N/A	N/A	0.180 2
	MNIST	35 000	0.687 9	0.541 1	0.574 9	0.663 9	0.585 1	0.691 4	0.685 7	0.673 2	0.625 7	0.696 0
		40 000	0.690 3	0.537 4	0.583 5	0.659 1	0.587 2	N/A	N/A	0.660 8	0.630 1	0.714 3
		70 000	0.698 4	N/A	0.606 8	0.668 5	N/A	N/A	N/A	N/A	N/A	0.720 7
	syn_1	5 000	0.802 7	0.584 0	0.852 7	0.603 4	0.637 0	0.696 0	0.701 8	0.693 1	0.629 5	0.894 6
	syn_1	10 000	0.814 3	0.647 0	0.683 5	0.542 0	0.573 9	0.703 0	0.809 3	0.695 7	0.634 8	0.849 3
	syn_2	50 000	0.819 4	0.648 5	0.689 0	0.537 6	N/A	N/A	N/A	0.719 1	0.784 0	0.851 0
	syn_2	100 000	0.820 6	0.619 4	0.691 4	0.550 3	N/A	N/A	N/A	N/A	N/A	0.837 9
CA	Coil20	1 440	0.539 5	0.100 0	0.655 4	0.549 2	0.637 9	0.594 1	0.648 8	0.503 5	0.518 4	0.656 1
	MF	2 000	0.595 2	0.538 4	0.603 1	0.590 0	0.579 6	0.623 9	0.630 2	0.603 8	0.514 3	0.738 2
	Lung	203	0.596 0	0.523 2	0.678 5	0.585 0	0.612 3	0.703 2	0.699 0	0.924 6	0.663 6	0.767 3
	TOX	171	0.498 4	0.480 7	0.479 2	0.455 6	0.487 1	0.523 8	0.510 0	0.609 3	0.497 3	0.619 0
	Semeion	1 593	0.666 7	0.579 1	0.640 0	0.649 2	0.584 3	0.683 5	0.686 9	0.630 1	0.587 0	0.692 1
	Covertype	35 000	0.520 1	0.427 1	0.393 7	0.371 5	0.328 1	0.496 0	0.512 0	0.446 1	0.421 6	0.535 5
		45 000	0.519 6	0.419 6	0.391 6	0.370 4	N/A	N/A	N/A	0.446 7	0.435 5	0.528 5
		581 012	0.516 0	N/A	0.390 0	0.369 4	N/A	N/A	N/A	N/A	N/A	0.524 6
	MNIST	35 000	0.865 8	0.769 3	0.752 8	0.538 5	0.760 0	0.876 1	0.874 0	0.873 8	0.805 4	0.876 4
		40 000	0.869 2	0.754 5	0.762 7	0.524 5	0.758 3	N/A	N/A	0.872 5	0.809 8	0.869 0
		70 000	0.873 2	N/A	0.769 8	0.537 4	N/A	N/A	N/A	N/A	N/A	0.874 7
	syn_1	5 000	0.683 7	0.528 5	0.728 6	0.803 6	0.523 5	0.563 7	0.570 5	0.559 4	0.537 2	0.906 4
	syn_1	10 000	0.696 3	0.611 7	0.657 4	0.469 3	0.579 4	0.710 3	0.701 7	0.560 0	0.54	0.861 9
	syn_2	50 000	0.701 4	0.610 0	0.658 3	0.468 0	N/A	N/A	N/A	0.679 3	0.693 2	0.819 4
	syn_2	100 000	0.703 5	0.614 4	0.655 0	0.471 0	N/A	N/A	N/A	N/A	N/A	0.803 0
ARI	Coil20	1 440	0.271 7	0.035 1	0.432 0	0.168 0	0.190 0	0.459 2	0.371 5	0.123 9	0.148 3	0.471 0
	MF	2 000	0.418 9	0.311 9	0.433 5	0.323 0	0.417 6	0.327 0	0.455 2	0.346 0	0.367 0	0.445 9
	Lung	203	0.150 5	0.101 9	0.318 2	0.153 0	0.193 1	0.404 3	0.403 6	0.740 3	0.314 8	0.566 7
	TOX	171	0.206 2	0.164 6	0.162 4	0.158 3	0.168 5	0.209 8	0.217 7	0.220 6	0.181 0	0.223 2
	Semeion	1 593	0.466 7	0.345 5	0.461 8	0.348 7	0.413 5	0.542 0	0.537 1	0.404 5	0.353 1	0.550 8
	Covertype	35 000	0.614 7	0.564 2	0.448 5	0.441 6	0.414 7	0.599 8	0.610 1	0.570 3	0.567 7	0.632 5
		45 000	0.613 8	0.561 3	0.443 7	0.428 5	N/A	N/A	N/A	0.570 6	0.569 8	0.629 1
		581 012	0.609 4	N/A	0.439 9	0.421 6	N/A	N/A	N/A	N/A	N/A	0.625 7
	MNIST	35 000	0.747 5	0.118 7	0.124 9	0.158 4	0.119 4	0.754 0	0.749 5	0.745 0	0.671 9	0.763 6
		40 000	0.749 2	0.112 3	0.126 0	0.160 0	0.120 6	N/A	N/A	0.745 2	0.670 5	0.755 7
		70 000	0.754 3	N/A	0.125 3	0.183 7	N/A	N/A	N/A	N/A	N/A	0.760 0
	syn_1	5 000	0.658 2	0.350 8	0.807 9	0.693 0	0.573 8	0.747 3	0.750 4	0.736 0	0.586 5	0.827 5
	syn_1	10 000	0.653 7	0.642 9	0.677 0	0.338 6	0.424 7	0.641 3	0.649 2	0.735 8	0.591 3	0.809 4
	syn_2	50 000	0.655 0	0.651 0	0.673 2	0.340 0	N/A	N/A	N/A	0.663 7	0.653 9	0.809 3
	syn_2	100 000	0.652 7	0.669 4	0.674 7	0.338 6	N/A	N/A	N/A	N/A	N/A	0.813 4

[1]	王文鹏, 秦寅畅, 师文轩. 工业缺陷检测无监督深度学习方法综述[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1658-1670.
[2]	潘理虎, 彭守信, 张睿, 薛之洋, 毛旭珍. 面向运动前景区域的视频异常检测[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1300-1309.
[3]	安俊秀, 杨林旺, 柳源. 基于邻近性语义感知的无监督文本风格迁移[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1139-1147.
[4]	陈瑞龙, 胡涛, 卜佑军, 伊鹏, 胡先君, 乔伟. 面向加密恶意流量检测模型的堆叠集成对抗防御方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 864-871.
[5]	洪梓榕, 包广清. 基于集成学习的雷达自动目标识别综述[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 371-382.
[6]	贾洁茹, 杨建超, 张硕蕊, 闫涛, 陈斌. 基于自蒸馏视觉Transformer的无监督行人重识别[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2893-2902.
[7]	夏吾吉, 黄鹤鸣, 更藏措毛, 范玉涛. 基于无监督学习和监督学习的抽取式文本摘要综述[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1035-1048.
[8]	江锐, 刘威, 陈成, 卢涛. 非对称端到端的无监督图像去雨网络[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 922-930.
[9]	王星, 刘贵娟, 陈志豪. 高斯混合模型与文本图卷积网络结合的虚假评论识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 360-368.
[10]	胡能兵, 蔡彪, 李旭, 曹旦华. 基于图池化对比学习的图分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3327-3334.
[11]	赵培, 乔焰, 胡荣耀, 袁新宇, 李敏悦, 张本初. 基于多域特征提取的多变量时间序列异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3419-3426.
[12]	龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310.
[13]	黄梦林, 段磊, 张袁昊, 王培妍, 李仁昊. 基于Prompt学习的无监督关系抽取模型[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2010-2016.
[14]	许喆, 王志宏, 单存宇, 孙亚茹, 杨莹. 基于重构误差的无监督人脸伪造视频检测[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1571-1577.
[15]	葛孟婷, 万鸣华. 基于近邻监督局部不变鲁棒主成分分析的特征提取模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1013-1020.

基于最大均值差异的子空间高斯混合模型聚类集成算法

Subspace Gaussian mixture model clustering ensemble algorithm based on maximum mean discrepancy

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 52

相关文章 15

编辑推荐

Metrics