Journal of Computer Applications

    Next Articles

Graph regularized Elastic Net Subspace Clustering

  

  • Received:2024-05-21 Revised:2024-09-13 Online:2024-10-08 Published:2024-10-08

图正则化弹性网子空间聚类

郭书剑,余节约,尹学松   

  1. 杭州电子科技大学
  • 通讯作者: 郭书剑
  • 基金资助:
    面向多源异构数据特征学习的关键技术研究及应用;温州市基础性公益科研项目

Abstract: Abstract: Graph-based Subspace Clustering (SC) has become a popular technique for efficiently handling high-dimensional data. However, existing methods suffer from the following two problems: the constructed graph neglects to establish associations with clustering; and it fails to capture the intrinsic correlation structure of the data. To address the above issues, a new subspace clustering method was proposed , called Graph regularized Elastic Net Subspace Clustering (GENSC). GENSC utilizes L2 norm regularization to enhance the connectivity among samples with the correlated structure, and employs L1 norm regularization to discard the connectivity among samples from different subspaces. It constructs a nearest neighbor graph of the representation to capture the intrinsic local structure of samples. Moreover, rank constraint was forced to representation learning for encouraging the learned graph to have clearer clustering structure. GENSC integrates L2 norm, L1 norm, and rank constraint into a general framework which was solved by the proposed optimization scheme. In comparison with existing methods on 9 real-world datasets, GENSC achieves maximum improvements of 9.03% in ACC, 7.61% in NMI, and 5.21% in Purity, demonstrating the effectiveness of the GENSC.

Key words: machine learning, subspace clustering, graph regularization, elastic net, rank constraint

摘要: 摘 要: 基于图的子空间聚类(SC)已成为有效处理高维数据的流行技术。然而,现有的方法存在如下2个问题:构建的图忽略了与聚类建立关联;无法捕捉数据的内在相关结构。为了解决上述问题,提出一个新的子空间聚类方法——图正则化弹性网子空间聚类(Graph regularized Elastic Net Subspace Clustering,GENSC)。GENSC使用L2范数正则化强化具有相关结构的样本之间的连通性,使用L1范数正则化摈弃不同子空间的样本之间的连通性。进一步,构建表征的最近邻图捕捉样本之间的内在局部结构,并增加秩约束鼓励所学习的图具有清晰的聚类结构。GENSC将L2范数、L1范数和秩约束刻画到一个一般的框架中,并提出一个迭代的优化算法求解该框架。在9个真实数据集上与现有的方法进行比较,ACC、NMI和Purity最大提高分别为9.03%、7.61%和5.21%,验证了GENSC方法的有效性 。

关键词: 机器学习, 子空间聚类, 图正则化, 弹性网, 秩约束

CLC Number: