Large-scale subspace clustering algorithm with Local structure learning

doi:10.11772/j.issn.1001-9081.2022111750

Abstract

Abstract:

The conventional large-scale subspace clustering methods ignore the local structure that prevails among the data when computing the anchor affinity matrix， and have large error when calculating the approximate eigenvectors of the Laplacian matrix， which is not conducive to data clustering. Aiming at the above problems， a Large-scale Subspace Clustering algorithm with Local structure learning （LLSC） was proposed. In the proposed algorithm， the local structure learning was embedded into the learning of anchor affinity matrix， which was able to comprehensively use global and local information to mine the subspace structure of data. In addition， inspired by Nonnegative Matrix Factorization （NMF）， an iterative optimization method was designed to simplify the solution of anchor affinity matrix. Then， the mathematical relationship between the anchor affinity matrix and the Laplacian matrix was established according to the Nystr?m approximation method， and the calculation method of the eigenvectors of the Laplacian matrix was modified to improve the clustering performance. Compared to LMVSC （Large-scale Multi-View Subspace Clustering）， SLSR （Scalable Least Square Regression）， LSC-k （Landmark-based Spectral Clustering using k-means）， and k-FSC（k-Factorization Subspace Clustering）， LLSC demonstrates significant improvements on four widely used large-scale datasets. Specifically， on the Pokerhand dataset， the accuracy of LLSC is 28.18 points percentage higher than that of k-FSC. These results confirm the effectiveness of LLSC.

Key words: subspace clustering, local structure learning, Nonnegative Matrix Factorization (NMF), large-scale clustering, low-rank approximation

摘要：

常规的大规模子空间聚类算法在计算锚点亲和矩阵时忽略了数据之间普遍存在的局部结构，且在计算拉普拉斯（Laplacian）矩阵的近似特征向量时存在较大误差，不利于数据聚类。针对上述问题，提出一种融合局部结构学习的大规模子空间聚类算法（LLSC）。所提算法将局部结构学习嵌入锚点亲和矩阵的学习，从而能够综合利用全局和局部信息挖掘数据的子空间结构；此外，受非负矩阵分解（NMF）的启发，设计一种迭代优化方法以简化锚点亲和矩阵的求解过程；其次，根据Nystr?m近似方法建立锚点亲和矩阵与Laplacian矩阵的数学联系，并改进Laplacian矩阵特征向量的计算方法以提升聚类性能。相较于LMVSC（Large-scale Multi-View Subspace Clustering）、SLSR（Scalable Least Square Regression）、LSC-k（Landmark-based Spectral Clustering using k-means）和k-FSC（k-Factorization Subspace Clustering），LLSC在4个广泛使用的大规模数据集上显示出明显的提升，其中，在Pokerhand数据集上，LLSC的准确率比k-FSC高28.18个百分点，验证了LLSC的有效性。

关键词: 子空间聚类, 局部结构学习, 非负矩阵分解, 大规模聚类, 低秩近似

CLC Number:

TP181

Qize REN, Hongjie JIA, Dongyu CHEN. Large-scale subspace clustering algorithm with Local structure learning[J]. Journal of Computer Applications, 2023, 43(12): 3747-3754.

任奇泽, 贾洪杰, 陈东宇. 融合局部结构学习的大规模子空间聚类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3747-3754.

Figures/Tables 7

Tab. 1 Symbols and their descriptions

符号	描述
$X ∈ R d × n$	数据集，包含n个样本，每个样本具有d个特征
$A ∈ R d × m$	采样锚点，包含m个样本，每个样本具有d个特征
$S ∈ R n × n$	相似性矩阵
$Z ∈ R n × m$	锚点亲和矩阵
$D ∈ R n × n$	度矩阵
$Y ∈ R n × m$	锚点距离矩阵
$L ∈ R n × n$	Laplacian矩阵
$M ∈ R m × m$	样本子矩阵
U	特征向量组成的矩阵
$I n$	n×1的向量，其中元素全为1
$· 2$	2-范数
$· F$	Frobenius范数
$T r (·)$	矩阵的迹
$d i a g (·)$	矩阵对角线构成的对角矩阵
$f (·)$	正则化函数

Tab. 1 Symbols and their descriptions

符号	描述
$X ∈ R d × n$	数据集，包含n个样本，每个样本具有d个特征
$A ∈ R d × m$	采样锚点，包含m个样本，每个样本具有d个特征
$S ∈ R n × n$	相似性矩阵
$Z ∈ R n × m$	锚点亲和矩阵
$D ∈ R n × n$	度矩阵
$Y ∈ R n × m$	锚点距离矩阵
$L ∈ R n × n$	Laplacian矩阵
$M ∈ R m × m$	样本子矩阵
U	特征向量组成的矩阵
$I n$	n×1的向量，其中元素全为1
$· 2$	2-范数
$· F$	Frobenius范数
$T r (·)$	矩阵的迹
$d i a g (·)$	矩阵对角线构成的对角矩阵
$f (·)$	正则化函数

Tab. 2 Specific information of small-scale datasets

数据集	样本数	特征数	类别数
Tr45	690	8 610	10
USPS	9 298	256	10
RCV1_4	9 625	29 992	4

Tab. 3 Clustering results on small-scale datasets

数据集	算法	Acc/%	NMI/%	Purity/%	Time/s
Tr45	LMVSC	72.31	69.23	75.65	5.70
	SGL	74.41	69.53	76.74	6.34
	FNC	48.40	31.22	31.22	4.83
	KMM	73.80	67.73	82.34	0.35
	LLSC	77.24	74.14	87.39	0.09
USPS	LMVSC	67.96	63.78	72.18	20.16
	SGL	63.03	71.98	75.58	417.00
	FNC	70.60	50.21	—	48.00
	KMM	71.01	71.35	—	3.01
	LLSC	80.21	78.75	80.22	1.00
RCV1_4	LMVSC	68.35	40.01	75.16	585.00
	SGL	70.52	45.76	79.37	98.86
	FNC	65.37	35.00	35.00	85.90
	KMM	47.63	17.12	47.63	4.10
	LLSC	72.74	43.34	72.74	0.17

Tab. 4 Specific information of large-scale datasets

数据集	样本数	特征数	类别数
MNIST	70 000	784	10
Postures	78 095	36	5
EMNIST（digits）	280 000	784	10
Pokerhand	1 000 000	10	10

Tab. 5 Clustering results on MNIST and Postures datasets

数据集	算法	Acc/%	NMI/%	Purity/%	Time/s
MNIST	LMVSC	58.89	53.74	65.98	73.33
	SLSR	52.26	47.72	57.06	252.28
	LSC-k	63.93	62.51	—	11.39
	LLSC	61.28	50.61	64.27	3.71
Postures	LMVSC	30.69	10.52	74.16	11 269.00
	SLSR	47.78	38.57	—	11.13
	LSC-k	46.40	37.24	—	207.70
	k-FSC	54.65	39.39	—	173.90
	LLSC	58.79	41.63	59.66	3.77

Tab. 6 Clustering results on Pokerhand and EMNIST（digits） datasets

数据集	算法	Acc/%	NMI/%	Purity/%	Time/s
Pokerhand	LMVSC	13.68	0.71	14.97	3 875.30
	SLSR	15.82	0.06	50.20	284.50
	LSC-k	12.32	0.00	—	8 829.00
	k-FSC	21.82	0.33	—	1 017.80
	LLSC	50.00	0.26	50.54	1 009.90
EMNIST （digits）	LMVSC	45.27	43.81	64.93	7 867.00
	SLSR	51.83	38.53	—	182.52
	LLSC	60.53	61.21	66.16	12.54

Fig. 1 Experimental results of different anchor points on different datasets with α at ［0.001 100］

References 51

1	ELHAMIFAR E， VIDAL R. Sparse subspace clustering： algorithm， theory， and applications［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2013， 35（11）： 2765-2781. 10.1109/tpami.2013.57
2	XUE X， ZHANG X， FENG X， et al. Robust subspace clustering based on non-convex low-rank approximation and adaptive kernel［J］. Information Sciences， 2020， 513： 190-205. 10.1016/j.ins.2019.10.058
3	ZHAO D， LIU J. Study on network security situation awareness based on particle swarm optimization algorithm［J］. Computers and Industrial Engineering， 2018， 125： 764-775. 10.1016/j.cie.2018.01.006
4	LI J， KONG Y， FU Y. Sparse subspace clustering by learning approximation ℓ₀ codes［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017： 2189-2195. 10.1609/aaai.v31i1.10782
5	ZHENG M， GAO P， ZHANG R， et al. End-to-end object detection with adaptive clustering transformer［C］// Proceedings of the 2021 British Machine Vision Conference. Durham： BMVA Press， 2021： No.709.
6	LIPOR J， HONG D， TAN Y S， et al. Subspace clustering using ensembles of K-subspaces［J］. Information and Inference： A Journal of the IMA， 2021， 10（1）： 73-107. 10.1093/imaiai/iaaa031
7	HE L， RAY N， GUAN Y， et al. Fast large-scale spectral clustering via explicit feature mapping ［J］. IEEE Transactions on Cybernetics， 2019， 49（3）： 1058-1071. 10.1109/tcyb.2018.2794998
8	KANG Z， ZHOU W， ZHAO Z， et al. Large-scale multi-view subspace clustering in linear time［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 4412-4419. 10.1609/aaai.v34i04.5867
9	LIU X， WANG L， ZHANG J， et al. Global and local structure preservation for feature selection ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2014， 25（6）： 1083-1095. 10.1109/tnnls.2013.2287275
10	PENG C， ZHANG Z， KANG Z， et al. Nonnegative matrix factorization with local similarity learning［J］. Information Sciences， 2021， 562：325-346. 10.1016/j.ins.2021.01.087
11	DING C， HE X， SIMON H D. Nonnegative Lagrangian relaxation of K-means and spectral clustering ［C］// Proceedings of the 2005 European Conference on Machine Learning， LNCS 3720. Berlin： Springer， 2005： 530-538.
12	FOWLKES C， BELONGIE S， CHUNG F， et al. Spectral grouping using the Nyström method［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2004， 26（2）： 214-225. 10.1109/tpami.2004.1262185
13	FRANGELLA Z， TROPP J A， UDELL M. Randomized Nyström preconditioning［J］. SIAM Journal on Matrix Analysis and Applications， 2023， 44（2）： 718-752. 10.1137/21m1466244
14	HUANG W， YIN M， LI J， et al. Deep clustering via weighted k-subspace network［J］. IEEE Signal Processing Letters， 2019， 26（11）： 1628-1632. 10.1109/lsp.2019.2941368
15	LI J， LIU H， TAO Z， et al. Learnable subspace clustering ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2022， 33（3）：1119-1133. 10.1109/tnnls.2020.3040379
16	HUANG S， ZHANG H， PIŽURICA A. Sketched sparse subspace clustering for large-scale hyperspectral images ［C］// Proceedings of the 2020 IEEE International Conference on Image Processing. Piscataway： IEEE， 2020： 1766-1770. 10.1109/icip40778.2020.9191074
17	ZHOU L， BAI X， ZHANG L， et al. Fast subspace clustering based on the Kronecker product［C］// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway： IEEE， 2021： 1558-1565. 10.1109/icpr48806.2021.9412287
18	LI J， TAO Z， WU Y， et al. Large-scale subspace clustering by independent distributed and parallel coding［J］. IEEE Transactions on Cybernetics， 2022， 52（9）：9090-9100. 10.1109/tcyb.2021.3052056
19	FAN J. Large-scale subspace clustering via k-factorization ［C］// Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York： ACM， 2021： 342-352. 10.1145/3447548.3467267
20	FENG Z， DONG J， GAO X， et al. Subspace representation based on pairwise linear regression for large scale image set classification［C］// Proceedings of the SPIE 12083， 13th International Conference on Graphics and Image Processing. Bellingham， WA： SPIE， 2022： No.1208318. 10.1117/12.2623163
21	SI X， YIN Q， ZHAO X， et al. Consistent and diverse multi-view subspace clustering with structure constraint［J］. Pattern Recognition， 2022， 121： No.108196. 10.1016/j.patcog.2021.108196
22	XU Y， CHEN S， LI J， et al. Linearity-aware subspace clustering［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2022： 8770-8778. 10.1609/aaai.v36i8.20857
23	WANG Y， WU L， LIN X， et al. Multiview spectral clustering via structured low-rank matrix factorization ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2018， 29（10）： 4833-4843. 10.1109/tnnls.2017.2777489
24	崔艺馨，陈晓东.Spark框架优化的大规模谱聚类并行算法［J］. 计算机应用， 2020， 40（1）： 168-172.
	CUI Y X， CHEN X D. Spark framework based optimized large-scale spectral clustering parallel algorithm ［J］. Journal of Computer Applications， 2020， 40（1）： 168-172.
25	ZHU X， ZHANG S， LI Y， et al. Low-rank sparse subspace for spectral clustering ［J］. IEEE Transactions on Knowledge and Data Engineering， 2019， 31（8）： 1532-1543. 10.1109/tkde.2018.2858782
26	HU Z， NIE F， WANG R， et al. Multi-view spectral clustering via integrating nonnegative embedding and spectral embedding ［J］. Information Fusion， 2020， 55： 251-259. 10.1016/j.inffus.2019.09.005
27	WANG C L， NIE F， WANG R， et al. Revisiting fast spectral clustering with anchor graph ［C］// Proceedings of the 2020 IEEE International Conference on Acoustics， Speech， and Signal Processing. Piscataway： IEEE， 2020： 3902-3906. 10.1109/icassp40776.2020.9053271
28	YANG Y， DENG S， LU J， et al. GraphLSHC： towards large scale spectral hypergraph clustering ［J］. Information Sciences， 2021， 544： 117-134. 10.1016/j.ins.2020.07.018
29	ZHAO Z， ZHANG Y， FENG Z. Towards scalable spectral embedding and data visualization via spectral coarsening［C］// Proceedings of the 14th ACM International Conference on Web Search and Data Mining. New York： ACM， 2021： 869-877. 10.1145/3437963.3441767
30	HAJJAR S EL， DORNAIKA F， ABDALLAH F. One-step multi-view spectral clustering with cluster label correlation graph ［J］. Information Sciences， 2022， 592： 97-111. 10.1016/j.ins.2022.01.017
31	INOUBLI W， ARIDHI S， MEZNI H， et al. A distributed and incremental algorithm for large-scale graph clustering ［J］. Future Generation Computer Systems， 2022， 134： 334-347. 10.1016/j.future.2022.04.013
32	WANG S， LIU X， LIU L， et al. Highly-efficient incomplete large-scale multi-view clustering with consensus bipartite graph ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 9766-9775. 10.1109/cvpr52688.2022.00955
33	HE W， ZHANG Z， CHEN Y， et al. Structured anchor-inferred graph learning for universal incomplete multi-view clustering ［J］. World Wide Web， 2023， 26（1）： 375-399. 10.1007/s11280-022-01012-7
34	PAATERO P， TAPPER U. Positive matrix factorization： a non-negative factor model with optimal utilization of error estimates of data values ［J］. Environmetrics， 1994， 5（2）： 111-126. 10.1002/env.3170050203
35	LEE D D， SEUNG H S. Learning the parts of objects by non-negative matrix factorization［J］. Nature， 1999， 401（6755）： 788-791. 10.1038/44565
36	LU X， DONG L， YUAN Y. Subspace clustering constrained sparse NMF for hyperspectral unmixing ［J］. IEEE Transactions on Geoscience and Remote Sensing， 2020， 58（5）： 3007-3019. 10.1109/tgrs.2019.2946751
37	PENG S， SER W， CHEN B， et al. Robust nonnegative matrix factorization with local coordinate constraint for image clustering［J］. Engineering Applications of Artificial Intelligence， 2020， 88： No.103354. 10.1016/j.engappai.2019.103354
38	ZHANG K， ZHAO X， PENG S. Multiple graph regularized semi-supervised nonnegative matrix factorization with adaptive weights for clustering［J］. Engineering Applications of Artificial Intelligence， 2021， 106： No.104499. 10.1016/j.engappai.2021.104499
39	PENG C， ZHANG Z， KANG Z， et al. Nonnegative matrix factorization with local similarity learning［J］. Information Sciences， 2021， 562： 325-346. 10.1016/j.ins.2021.01.087
40	GILLIS N， HIEN L T K， LEPLAT V， et al. Distributionally robust and multi-objective nonnegative matrix factorization［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（8）： 4052-4064.
41	LEPLAT V， GILLIS N， FÉVOTTE C. Multi-resolution beta-divergence NMF for blind spectral unmixing［J］. Signal Processing， 2022， 193： No.108428. 10.1016/j.sigpro.2021.108428
42	LIU X， SONG P， SHENG C， et al. Robust multi-view non-negative matrix factorization for clustering ［J］. Digital Signal Processing， 2022， 123： No.103447. 10.1016/j.dsp.2022.103447
43	FAN D， ZHANG X， KANG W， et al. Video watermarking algorithm based on NSCT， pseudo 3D-DCT and NMF［J］. Sensors， 2022， 22（13）： No.4752. 10.3390/s22134752
44	KANG Z， PENG C， CHENG Q. Twin learning for similarity and clustering： a unified kernel approach ［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017： 2080-2086. 10.1609/aaai.v31i1.10853
45	PENG R， SUN H， ZANETTI L. Partitioning well-clustered graphs： spectral clustering works！［C］// Proceedings of the 28th Conference on Learning Theory. New York： JMLR.org， 2015： 1423-1455.
46	CHEN X， CAI D. Large scale spectral clustering with landmark-based representation ［C］// Proceedings of the 25th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2011： 313-318. 10.1609/aaai.v25i1.7900
47	KANG Z， LIN Z， ZHU X， et al. Structured graph learning for scalable subspace clustering： from single view to multiview ［J］. IEEE Transactions on Cybernetics， 2022， 52（9）： 8976-8986. 10.1109/tcyb.2021.3061660
48	CHEN X， HONG W， NIE F， et al. Spectral clustering of large-scale data by directly solving normalized cut ［C］// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2018： 1206-1215. 10.1145/3219819.3220039
49	NIE F， WANG C L， LI X. K-multiple-means： a multiple-means clustering method with specified k clusters ［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2019： 959-967. 10.1145/3292500.3330846
50	CAI D， CHEN X. Large scale spectral clustering via landmark-based sparse representation［J］. IEEE Transactions on Cybernetics， 2015， 45（8）： 1669-1680. 10.1109/tcyb.2014.2358564
51	PENG X， TANG H， ZHANG L， et al. A unified framework for representation-based subspace clustering of out-of-sample and large-scale data ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2016， 27（12）： 2499-2512. 10.1109/tnnls.2015.2490080

[1]	Zhifeng MA, Junyang YU, Longge WANG. Diversity represented deep subspace clustering algorithm [J]. Journal of Computer Applications, 2023, 43(2): 407-412.
[2]	Lili FAN, Guifu LU, Ganyi TANG, Dan YANG. Low-rank representation subspace clustering method based on Hessian regularization and non-negative constraint [J]. Journal of Computer Applications, 2022, 42(1): 115-122.
[3]	WANG Jinkai, JIA Xu. Vein recognition algorithm based on Siamese nonnegative matrix factorization with transferability [J]. Journal of Computer Applications, 2021, 41(3): 898-903.
[4]	Hua LI, Guifu LU, Qinru YU. Manifold regularized nonnegative matrix factorization based on clean data [J]. Journal of Computer Applications, 2021, 41(12): 3492-3498.
[5]	Ran GAO, Huazhu CHEN. Improved subspace clustering model based on spectral clustering [J]. Journal of Computer Applications, 2021, 41(12): 3645-3651.
[6]	ZHU Yuna, ZHANG Yutao, YAN Shaoge, FAN Yudan, CHEN Hantuo. Protocol identification approach based on semi-supervised subspace clustering [J]. Journal of Computer Applications, 2021, 41(10): 2900-2904.
[7]	WANG Lijuan, CHEN Shaomin, YIN Ming, XU Yueying, HAO Zhifeng, CAI Ruichu, WEN Wen. Improved block diagonal subspace clustering algorithm based on neighbor graph [J]. Journal of Computer Applications, 2021, 41(1): 36-42.
[8]	WANG Jinkai, JIA Xu. Vehicle face recognition algorithm based on NMF with weighted and orthogonal constraints [J]. Journal of Computer Applications, 2020, 40(4): 1050-1055.
[9]	Meng ZENG, Bin NING, Zhihua CAI, Qiong GU. Hyperspectral band selection based on deep adversarial subspace clustering [J]. Journal of Computer Applications, 2020, 40(2): 381-385.
[10]	LI Denggang, CHEN Xiangxiang, LI Huali, WANG Zhongmei. Manifold regularized sparse constraint nonnegative matrix factorization with superpixel algorithm for hyperspectral unmixing [J]. Journal of Computer Applications, 2019, 39(10): 3100-3106.
[11]	CAO Dawei, HE Chaobo, CHEN Qimai, LIU Hai. Short text clustering algorithm based on weighted kernel nonnegative matrix factorization [J]. Journal of Computer Applications, 2018, 38(8): 2180-2184.
[12]	JIA Xu, SUN Fuming, LI Haojie, CAO Yudong. Image feature extraction method based on improved nonnegative matrix factorization with universality [J]. Journal of Computer Applications, 2018, 38(1): 233-237.
[13]	CHENG Lingfang, YANG Tianpeng, CHEN Lifei. Soft subspace clustering algorithm for imbalanced data [J]. Journal of Computer Applications, 2017, 37(10): 2952-2957.
[14]	WU Jieqi, LI Xiaoyu, YUAN Xiaotong, LIU Qingshan. Parallel sparse subspace clustering via coordinate descent minimization [J]. Journal of Computer Applications, 2016, 36(2): 372-376.
[15]	LIU Jinlong, XIONG Chengyi, GAO Zhirong, ZHOU Cheng, WANG Shuxian. Image compressive sensing reconstruction via total variation and adaptive low-rank regularization [J]. Journal of Computer Applications, 2016, 36(1): 233-237.