Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (12): 3747-3754.DOI: 10.11772/j.issn.1001-9081.2022111750

• Data science and technology • Previous Articles     Next Articles

Large-scale subspace clustering algorithm with Local structure learning

Qize REN, Hongjie JIA(), Dongyu CHEN   

  1. School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang Jiangsu 212013,China
  • Received:2022-11-24 Revised:2023-04-30 Accepted:2023-05-12 Online:2023-06-16 Published:2023-12-10
  • Contact: Hongjie JIA
  • About author:REN Qize, born in 1997, M. S. candidate. His research interests include large-scale subspace clustering.
    CHEN Dongyu, born in 2000. His research interests include subspace clustering.
  • Supported by:
    National Natural Science Foundation of China(61906077)

融合局部结构学习的大规模子空间聚类算法

任奇泽, 贾洪杰(), 陈东宇   

  1. 江苏大学 计算机科学与通信工程学院,江苏 镇江 212013
  • 通讯作者: 贾洪杰
  • 作者简介:任奇泽(1997—),男,河南许昌人,硕士研究生,主要研究方向:大规模子空间聚类
    贾洪杰(1988—),男,河北衡水人,副教授,博士,CCF会员,主要研究方向:聚类、机器学习;Email:jiahj@ujs.edu.cn
    陈东宇(2000—),男,广西柳州人,主要研究方向:子空间聚类。
  • 基金资助:
    国家自然科学基金资助项目(61906077)

Abstract:

The conventional large-scale subspace clustering methods ignore the local structure that prevails among the data when computing the anchor affinity matrix, and have large error when calculating the approximate eigenvectors of the Laplacian matrix, which is not conducive to data clustering. Aiming at the above problems, a Large-scale Subspace Clustering algorithm with Local structure learning (LLSC) was proposed. In the proposed algorithm, the local structure learning was embedded into the learning of anchor affinity matrix, which was able to comprehensively use global and local information to mine the subspace structure of data. In addition, inspired by Nonnegative Matrix Factorization (NMF), an iterative optimization method was designed to simplify the solution of anchor affinity matrix. Then, the mathematical relationship between the anchor affinity matrix and the Laplacian matrix was established according to the Nystr?m approximation method, and the calculation method of the eigenvectors of the Laplacian matrix was modified to improve the clustering performance. Compared to LMVSC (Large-scale Multi-View Subspace Clustering), SLSR (Scalable Least Square Regression), LSC-k (Landmark-based Spectral Clustering using k-means), and k-FSC(k-Factorization Subspace Clustering), LLSC demonstrates significant improvements on four widely used large-scale datasets. Specifically, on the Pokerhand dataset, the accuracy of LLSC is 28.18 points percentage higher than that of k-FSC. These results confirm the effectiveness of LLSC.

Key words: subspace clustering, local structure learning, Nonnegative Matrix Factorization (NMF), large-scale clustering, low-rank approximation

摘要:

常规的大规模子空间聚类算法在计算锚点亲和矩阵时忽略了数据之间普遍存在的局部结构,且在计算拉普拉斯(Laplacian)矩阵的近似特征向量时存在较大误差,不利于数据聚类。针对上述问题,提出一种融合局部结构学习的大规模子空间聚类算法(LLSC)。所提算法将局部结构学习嵌入锚点亲和矩阵的学习,从而能够综合利用全局和局部信息挖掘数据的子空间结构;此外,受非负矩阵分解(NMF)的启发,设计一种迭代优化方法以简化锚点亲和矩阵的求解过程;其次,根据Nystr?m近似方法建立锚点亲和矩阵与Laplacian矩阵的数学联系,并改进Laplacian矩阵特征向量的计算方法以提升聚类性能。相较于LMVSC(Large-scale Multi-View Subspace Clustering)、SLSR(Scalable Least Square Regression)、LSC-k(Landmark-based Spectral Clustering using k-means)和k-FSC(k-Factorization Subspace Clustering),LLSC在4个广泛使用的大规模数据集上显示出明显的提升,其中,在Pokerhand数据集上,LLSC的准确率比k-FSC高28.18个百分点,验证了LLSC的有效性。

关键词: 子空间聚类, 局部结构学习, 非负矩阵分解, 大规模聚类, 低秩近似

CLC Number: