A Novel Subspace-Based GMM Clustering Ensemble Algorithm for High-Dimensional Data

He Yulin; He Yingting; Zhan Zhaowu; Philippe Fournier-Viger; Huang Joshua Zhexue

doi:10.23919/cje.2023.00.153

Yulin He, Yingting He, Zhaowu Zhan, et al., “a novel subspace-based GMM clustering ensemble algorithm for high-dimensional data,” Chinese Journal of Electronics, vol. 34, no. 2, pp. 612–629, 2025. DOI: 10.23919/cje.2023.00.153

Citation:

A Novel Subspace-Based GMM Clustering Ensemble Algorithm for High-Dimensional Data

Graphical Abstract

Graphical Abstract

Abstract

Abstract

The Gaussian mixture model (GMM) is a classical probabilistic representation model widely used in unsupervised learning. GMM performs poorly on high-dimensional data (HDD) due to the requirement of estimating a large number of parameters with relatively few observations. To address this, the paper proposes a novel subspace-based GMM clustering ensemble (SubGMM-CE) algorithm tailored for HDD. The proposed SubGMM-CE algorithm comprises three key components. A series of low-dimensional subspaces are dynamically determined, considering the optimal number of GMM components. The GMM-based clustering algorithm is applied to each subspace to obtain a series of heterogeneous GMM models. These GMM base clustering results are merged using the newly-designed relabeling strategy based on the average shared affiliation probability, generating the final clustering result for high-dimensional unlabeled data. An exhaustive experimental evaluation validates the feasibility, rationality, effectiveness, and robustness to noise of the SubGMM-CE algorithm. Results show that SubGMM-CE achieves higher stability and more accurate clustering results, outperforming nine state-of-the-art clustering algorithms in normalized mutual information, clustering accuracy, and adjusted rand index scores. This demonstrates the viability of the SubGMM-CE algorithm in addressing HDD clustering challenges.