《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (5): 1464-1471.DOI: 10.11772/j.issn.1001-9081.2024050651

• 人工智能 • 上一篇    

图正则化弹性网子空间聚类

郭书剑1,2, 余节约1,2(), 尹学松1,2   

  1. 1.杭州电子科技大学 人文艺术与数字媒体学院,杭州 310037
    2.杭州电子科技大学 温州研究院,浙江 温州 325038
  • 收稿日期:2024-05-22 修回日期:2024-09-13 接受日期:2024-09-26 发布日期:2024-10-08 出版日期:2025-05-10
  • 通讯作者: 余节约
  • 作者简介:郭书剑(1999—),女,河北保定人,硕士研究生,主要研究方向:机器学习、数据挖掘
    余节约(1969—),男,山东菏泽人,教授,博士,主要研究方向:图像处理、色彩管理
    尹学松(1975—),男,安徽长丰人,教授,博士,主要研究方向:机器学习、数据挖掘、图像处理。
  • 基金资助:
    浙江省公益技术应用研究项目(LGG22F020032);温州市基础性公益科研项目(G2023093)

Graph regularized elastic net subspace clustering

Shujian GUO1,2, Jieyue YU1,2(), Xuesong YIN1,2   

  1. 1.School of Media and Design,Hangzhou Dianzi University,Hangzhou Zhejiang 310037,China
    2.Wenzhou Institute of Hangzhou Dianzi University,Wenzhou Zhejiang 325038,China
  • Received:2024-05-22 Revised:2024-09-13 Accepted:2024-09-26 Online:2024-10-08 Published:2025-05-10
  • Contact: Jieyue YU
  • About author:GUO Shujian, born in 1999, M. S. candidate. Her research interests include machine learning, data mining.
    YU Jieyue, born in 1969, Ph. D., professor. His research interests include image processing, color management.
    YIN Xuesong, born in 1975, Ph. D., professor. His research interests include machine learning, data mining, image processing.
  • Supported by:
    Public-Welfare Technology Application Research Project of Zhejiang Province(LGG22F020032);Basic Public-Welfare Research Project of Wenzhou(G2023093)

摘要:

基于图的子空间聚类(SC)已成为有效处理高维数据的流行技术。然而,现有方法存在以下问题:构建的图忽略了与聚类建立关联以及无法捕捉数据的内在相关结构。为了解决上述问题,提出一个新的SC方法——图正则化弹性网子空间聚类(GENSC)。GENSC使用L2范数正则化强化具有相关结构的样本之间的连通性,并使用L1范数正则化摒弃不同子空间的样本之间的连通性;同时,构建表征的最近邻图捕捉样本之间的内在局部结构,并增加秩约束以鼓励所学习的图具有清晰的聚类结构。GENSC将L2范数、L1范数和秩约束刻画到一个一般的框架中,并提出一个迭代的优化算法来求解该框架。在9个真实数据集上与现有方法进行比较的实验结果表明,在ChinaCXRSet上,GENSC的精确度(Accuracy)和归一化互信息(NMI)值分别超出次优方法9.03和7.61个百分点,聚类纯度(Purity)达到最好;在UMIST上,GENSC的精确度、NMI和Purity值分别超出次优方法4.15、3.17和5.21个百分点,验证了GENSC的有效性。

关键词: 机器学习, 子空间聚类, 图正则化, 弹性网, 秩约束

Abstract:

Graph-based Subspace Clustering (SC) has become a popular technique for processing high-dimensional data efficiently. However, existing methods suffer from the following problems: the constructed graph neglects to establish associations with clustering and fails to capture intrinsic correlated structure of the data. To address these issues, a new SC method was proposed, called Graph regularized Elastic Net Subspace Clustering (GENSC). GENSC employed L2 norm regularization to enhance the connectivity among samples with the correlated structure, and utilized L1 norm regularization to discard the connectivity among samples from different subspaces. Simultaneously, a nearest neighbor graph of the representation was constructed to capture the intrinsic local structure among samples, and a rank constraint was incorporated to encourage the learned graph to have clear clustering structure. GENSC integrated L2 norm, L1 norm, and rank constraint into a general framework which was solved by an iterative optimization algorithm. Experimental results on nine real-world datasets demonstrate that on ChinaCXRSet, the accuracy and Normalized Mutual Information (NMI) values of GENSC exceeded the second-best method by 9.03 and 7.61 percentage points, respectively, and the clustering Purity reached the best; on UMIST, the accuracy, NMI, and Purity values of GENSC exceeded the second-best method by 4.15, 3.17 and 5.21 percentage points, respectively, validating the effectiveness of GENSC.

Key words: machine learning, subspace clustering, graph regularization, elastic net, rank constraint

中图分类号: