Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Subspace Gaussian mixture model clustering ensemble algorithm based on maximum mean discrepancy
Yulin HE, Xu LI, Yingting HE, Laizhong CUI, Zhexue HUANG
Journal of Computer Applications    2025, 45 (6): 1712-1723.   DOI: 10.11772/j.issn.1001-9081.2024070943
Abstract23)   HTML0)    PDF (2129KB)(11)       Save

To address the problems of limited capability and parameter sensitivity of Gaussian Mixture Model (GMM) clustering algorithms in processing large-scale high-dimensional data clustering, a Subspace GMM Clustering Ensemble (SGMM-CE) algorithm based on Maximum Mean Discrepancy (MMD) was proposed. Firstly, Random Sample Partition (RSP) was performed to the original large-scale high-dimensional dataset to obtain multiple subsets of data, thereby reducing the size of clustering problem from the perspective of sample size. Secondly, subspace learning was performed in the high-dimensional feature space corresponding to each subset of data by considering the influence of features on optimal number of GMM components, so that multiple low-dimensional feature subspaces corresponding to each high-dimensional feature space were obtained, and then GMM clustering was conducted on each subspace to obtain a series of heterogeneous GMMs. Thirdly, GMM clustering results of different subspaces from the same subset of data were relabeled and merged on the basis of the proposed Average Shared Affiliation Probability (ASAP). Finally, the expanded Subspace MMD (SubMMD) was used as a criterion to measure distributional consistency between two clusters in the clustering results of different subsets of data, so as to relabel and merge clustering results of these subsets of data based on the above, thereby obtaining the final clustering ensemble result of the original dataset. Exhaustive experiments were conducted to validate the effectiveness of SGMM-CE algorithm. Experimental results show that compared with the best-performing comparison algorithm — Meta-CLustering Algorithm (MCLA), SGMM-CE algorithm increases 19%, 20%, and 52% for Normalized Mutual Information (NMI), Clustering Accuracy (CA) and Adjusted Rand Index (ARI) values, respectively, on the given clustering datasets. Besides, the feasibility and rationality experimental results reflect that SGMM-CE algorithm has parameter convergence and time efficiency, demonstrating that this algorithm can deal with large-scale high-dimensional data clustering problems effectively.

Table and Figures | Reference | Related Articles | Metrics