计算机应用 ›› 2021, Vol. 41 ›› Issue (1): 36-42.DOI: 10.11772/j.issn.1001-9081.2020061005

所属专题: 第八届中国数据挖掘会议(CCDM 2020)

• 第八届中国数据挖掘会议(CCDM 2020) • 上一篇    下一篇

基于近邻图改进的块对角子空间聚类算法

王丽娟1, 陈少敏1, 尹明2, 许跃颖3, 郝志峰1,4, 蔡瑞初1, 温雯1   

  1. 1. 广东工业大学 计算机学院, 广州 510006;
    2. 广东工业大学 自动化学院, 广州 510006;
    3. 北京师范大学珠海分校 信息技术学院, 广东 珠海 519000;
    4. 佛山科学技术学院 数学与大数据学院, 广东 佛山 528000
  • 收稿日期:2020-05-31 修回日期:2020-09-02 出版日期:2021-01-10 发布日期:2021-01-16
  • 通讯作者: 尹明
  • 作者简介:王丽娟(1978-),女,河北邢台人,副教授,博士,主要研究方向:数据挖掘、机器学习;陈少敏(1994-),女,广东汕头人,硕士研究生,主要研究方向:数据挖掘、子空间聚类;尹明(1975-),男,湖南永州人,副教授,博士,主要研究方向:机器学习、模式识别、图像处理;许跃颖(1984-),男,广东汕头人,讲师,硕士,主要研究方向:互联网应用;郝志峰(1968-),男,江苏苏州人,教授,博士,主要研究方向:机器学习、人工智能;蔡瑞初(1983-),男,浙江温州人,教授,博士,主要研究方向:机器学习、数据挖掘;温雯(1981-),女,江西赣州人,副教授,博士,主要研究方向:支持向量机、模式识别。
  • 基金资助:
    国家自然科学基金资助项目(61502108,61876042,61876043);NSFC-广东联合基金资助项目(U1501254)。

Improved block diagonal subspace clustering algorithm based on neighbor graph

WANG Lijuan1, CHEN Shaomin1, YIN Ming2, XU Yueying3, HAO Zhifeng1,4, CAI Ruichu1, WEN Wen1   

  1. 1. School of Computers, Guangdong University of Technology, Guangzhou Guangdong 510006, China;
    2. School of Automation, Guangdong University of Technology, Guangzhou Guangdong 510006, China;
    3. School of Information Technology, Beijing Normal University, Zhuhai, Zhuhai Guangdong 519000, China;
    4. School of Mathematics and Big Data, Foshan University, Foshan Guangdong 528000, China
  • Received:2020-05-31 Revised:2020-09-02 Online:2021-01-10 Published:2021-01-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61502108, 61876042, 61876043), the NSFC-Guangdong Joint Found (U1501254).

摘要: 块对角表示(BDR)模型可以通过利用线性表示对数据有效地进行聚类,却无法很好地利用高维数据常见的非线性流形结构信息。针对这一问题,提出了基于近邻图改进的块对角子空间聚类(BDRNG)算法来通过近邻图来线性拟合高维数据的局部几何结构,并通过块对角约束来生成具有全局信息的块对角结构。BDRNG同时学习全局信息以及局部数据结构,从而获得更好的聚类表现。由于模型包含近邻图算子和非凸的块对角表示范数,BDRNG 采用了交替最小化来优化求解算法。实验结果如下:在噪声数据集上,BDRNG能够生成稳定的块对角结构系数矩阵,这说明了BDRNG对于噪声数据具有鲁棒性;在标准数据集上,BDRNG的聚类表现均优于BDR,尤其在人脸数据集上,相较于BDR,BDRNG的聚类准确度提高了8%。

关键词: 近邻图, 块对角表示, 稀疏表示, 子空间聚类, 高维数据

Abstract: Block Diagonal Representation (BDR) model can efficiently cluster data by using linear representation, but it cannot make good use of non-linear manifold information commonly appeared in high-dimensional data. To solve this problem, the improved Block Diagonal Representation based on Neighbor Graph (BDRNG) clustering algorithm was proposed to perform the linear fitting of the local geometric structure by the neighbor graph and generate the block-diagonal structure by using the block-diagonal regularization. In BDRNG algorithm, both global information and local data structure were learned at the same time to achieve a better clustering performance. Due to the fact that the model contains the neighbor graph and non-convex block-diagonal representation norm, the alternative minimization was adopted by BDRNG to optimize the solving algorithm. Experimental results show that:on the noise dataset, BDRNG can generate the stable coefficient matrix with block-diagonal form, which proves that BDRNG is robust to the noise data; on the standard datasets, BDRNG has better clustering performance than BDR, especially on the facial dataset, BDRNG has the clustering accuracy 8% higher than BDR.

Key words: neighbor graph, Block Diagonal Representation (BDR), sparse representation, subspace clustering, high-dimensional data

中图分类号: