Robust multi-view subspace clustering based on consistency graph learning

doi:10.11772/j.issn.1001-9081.2021061056

Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (12): 3438-3446.DOI: 10.11772/j.issn.1001-9081.2021061056

• The 18th China Conference on Machine Learning • Previous Articles

Robust multi-view subspace clustering based on consistency graph learning

Zhenjun PAN, Cheng LIANG(), Huaxiang ZHANG

School of Information Science and Engineering，Shandong Normal University，Jinan Shandong 250358，China

Received:2021-05-12 Revised:2021-07-16 Accepted:2021-07-26 Online:2021-12-28 Published:2021-12-10
Contact: Cheng LIANG
About author:PAN Zhenjun， born in 1996， M. S. candidate. Her research interests include machine learning， biomedical big data analysis.
ZHANG Huaxiang， born in 1966， Ph. D.， professor. His research interests include pattern recognition， multimodal data retrieval， pedestrian re-identification.
Supported by:
the Joint Funds of National Natural Science Foundation of China(U1836216);the Surface Program of National Natural Science Foundation of China(61873089);the Major Fundamental Research Project of Shandong Province(ZR2019ZD03)

基于一致图学习的鲁棒多视图子空间聚类

潘振君, 梁成(), 张化祥

山东师范大学信息科学与工程学院，济南 250358

通讯作者: 梁成
作者简介:潘振君（1996—），女，山东潍坊人，硕士研究生，CCF会员，主要研究方向：机器学习、生物医学大数据分析
张化祥（1966—），男，山东汶上人，教授，博士，CCF会员，主要研究方向：模式识别、多模态数据检索、行人重识别。
基金资助:
国家自然科学基金联合基金资助项目(U1836216);国家自然科学基金面上项目(61873089);山东省重大基础研究项目(ZR2019ZD03)

Abstract

Abstract:

Concerning that the multi-view data analysis is susceptible to the noise of the original dataset and requires additional steps to calculate the clustering results， a Robust Multi-view subspace clustering based on Consistency Graph Learning （RMCGL） algorithm was proposed. Firstly， the potential robust representation of data in the subspace was learned in each view， and the similarity matrix of each view was obtained based on these representations. Then， a unified similarity graph was learned based on the obtained multiple similarity matrices. Finally， by adding rank constraints to the Laplacian matrix corresponding to the similarity graph， the obtained similarity graph had the optimal clustering structure， and the final clustering results were able to be obtained directly by using this similarity graph. The process was completed in a unified optimization framework， in which potential robust representations， similarity matrices and consistency graphs could be learned simultaneously. The clustering Accuracy （ACC） of RMCGL algorithm is 3.36 percentage points， 5.82 percentage points and 5.71 percentage points higher than that of Graph-based Multi-view Clustering （GMC） algorithm on BBC， 100leaves and MSRC datasets， respectively. Experimental results show that the proposed algorithm has a good clustering effect.

Key words: multi-view, consistency graph, subspace, clustering, self-weighting, graph learning

摘要：

针对多视图数据分析易受原始数据集噪声干扰，以及需要额外的步骤计算聚类结果的问题，提出一种基于一致图学习的鲁棒多视图子空间聚类（RMCGL）算法。首先，在各个视图下学习数据在子空间中的潜在鲁棒表示，并基于该表示得到各视图的相似度矩阵。随后，基于得到的多个相似度矩阵学习一个统一的相似度图。最后，通过对相似度图对应的拉普拉斯矩阵添加秩约束，确保得到的相似度图具有最优的聚类结构，并可直接得到最终的聚类结果。该过程在一个统一的优化框架中完成，能同时学习潜在鲁棒表示、相似度矩阵和一致图。RMCGL算法的聚类精度（ACC）在BBC、100leaves和MSRC数据集上比基于图的多视图聚类（GMC）算法分别提升了3.36个百分点、5.82个百分点和5.71个百分点。实验结果表明，该算法具有良好的聚类效果。

关键词: 多视图, 一致图, 子空间, 聚类, 自加权, 图学习

CLC Number:

TP181

Zhenjun PAN, Cheng LIANG, Huaxiang ZHANG. Robust multi-view subspace clustering based on consistency graph learning[J]. Journal of Computer Applications, 2021, 41(12): 3438-3446.

潘振君, 梁成, 张化祥. 基于一致图学习的鲁棒多视图子空间聚类[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3438-3446.

Figures/Tables 16

Fig. 1 Flowchart of RMCGL algorithm

Tab. 1 Common symbols and their definitions

符号	含义
n	样本数
V	视图数
d_v	第v个视图的特征维数
$X v = X 1 v, X 2 v, ⋯, X n v ∈ R d v × n$	第v个视图的数据矩阵
$A v ∈ R d v × n$	第v个视图的表示矩阵
$E v ∈ R d v × n$	第v个视图的误差矩阵
$S v ∈ R d v × n$	第v个视图的相似度矩阵
$G ∈ R d v × n$	一致图

Tab. 1 Common symbols and their definitions

符号	含义
n	样本数
V	视图数
d_v	第v个视图的特征维数
$X v = X 1 v, X 2 v, ⋯, X n v ∈ R d v × n$	第v个视图的数据矩阵
$A v ∈ R d v × n$	第v个视图的表示矩阵
$E v ∈ R d v × n$	第v个视图的误差矩阵
$S v ∈ R d v × n$	第v个视图的相似度矩阵
$G ∈ R d v × n$	一致图

Tab. 2 Detailed description of six real-world datasets

数据集	大小	视图数	类别数	特征数
数据集	大小	视图数	类别数	v₁	v₂	v₃	v₄	v₅	v₆
100leaves	1 600	3	100	64	64	64
BBC	685	4	5	4 659	4 633	4 665	4 684
NGs	500	3	5	2 000	2 000	2 000
Caltech101-20	2 386	6	20	48	40	254	1 984	512	928
COIL20	1 440	3	20	512	1 239	324
MSRC	210	5	7	1 302	48	512	256	210

Tab. 3 Time complexity comparison

算法	时间复杂度
GBS	O（（（Vk+Vn+c+cn）n）t+Vnkd）
Multi-NMF	O（t_outt_inVdnk）
CoregSC	O（tcn²）
Cotrain	O（tcn² +Vnkd）
MLAN	O（t（cn²+Vn²+n²）+n²）
GMC	O（（（Vk+Vn+c+cn）n）t+Vnkd）
RMCGL	O（t（Vnk+cn+cn²+Vn²）+Vnkd）

Tab. 4 ACC values of different clustering algorithms on different datasets

算法	100leaves	BBC	NGs	Caltech101-20	COIL20	MSRC	synthetic	two Gaussian
NMF	41.71	51.02	45.73	34.58	66.18	50.38	32.57	70.33
CAN	50.71	40.08	26.80	44.76	76.27	42.09	36.00	27.83
K-means	41.33	46.59	24.07	36.49	53.65	52.43	53.11	82.50
GBS	82.44	69.34	98.20	54.27	82.99	72.86	53.50	90.00
Multi-NMF	64.54	41.49	26.32	38.01	65.96	57.32	25.50	60.50
CoregSC	77.02	46.83	28.60	32.66	68.93	77.29	55.94	89.50
Cotrain	78.38	63.75	67.26	40.43	77.96	75.19	53.60	89.75
MLAN	87.94	69.05	96.40	53.48	81.67	73.33	56.00	90.00
GMC	82.37	69.34	98.20	45.64	82.99	72.86	53.50	90.00
RMCGL	88.19	72.70	98.40	60.90	83.75	78.57	59.50	90.00

Tab. 5 NMI values of different clustering algorithms on different datasets

算法	100leaves	BBC	NGs	Caltech101-20	COIL20	MSRC	synthetic	two Gaussian
NMF	67.66	33.09	28.34	40.07	74.37	40.06	21.81	22.74
CAN	71.63	12.15	10.07	26.78	89.75	19.72	17.91	3.19
K-means	67.52	27.14	4.87	42.11	69.07	41.40	42.40	38.75
GBS	93.43	56.28	93.92	51.79	92.42	77.22	49.44	61.90
Multi-NMF	84.30	25.53	11.64	49.38	79.54	48.61	5.85	3.16
CoregSC	92.01	24.49	8.06	50.14	82.30	67.87	40.02	57.96
Cotrain	86.96	36.00	26.93	48.98	82.20	60.59	48.99	59.27
MLAN	94.41	49.08	89.18	47.13	91.18	76.74	37.87	61.00
GMC	92.92	56.28	93.92	48.09	92.42	77.22	49.44	61.90
RMCGL	95.47	63.34	94.44	57.77	94.33	77.54	54.91	61.90

Tab. 6 Purity values of different clustering algorithms on different datasets

算法	100leaves	BBC	NGs	Caltech101-20	COIL20	MSRC	synthetic	two Gaussian
NMF	45.52	59.82	49.20	68.15	68.26	52.76	35.55	70.33
CAN	55.13	40.69	27.53	50.57	81.41	32.86	36.33	20.56
K-means	42.90	51.50	25.19	65.79	56.33	54.19	56.76	82.50
GBS	57.11	47.89	95.54	64.33	82.57	61.90	53.50	90.00
Multi-NMF	68.84	45.04	27.33	68.55	68.71	59.94	26.50	60.50
CoregSC	80.33	51.29	29.40	69.02	71.46	78.50	56.39	89.50
Cotrain	73.56	57.09	43.73	68.86	75.56	75.00	53.65	89.75
MLAN	89.81	69.05	96.40	66.60	84.10	80.00	56.00	90.00
GMC	85.06	69.34	98.20	55.49	85.00	79.52	53.50	90.00
RMCGL	89.88	72.70	98.40	70.49	87.29	80.00	61.00	90.00

Tab. 7 Comparison of empirical p-values on four cancer datasets

算法	Breast	Colon	GBM	Melanoma
Multi-NMF	0.516 9	0.449 1	7.18E-03	0.453 6
CoregSC	0.118 9	0.570 8	0.506 0	4.87E-04
Cotrain	0.075 7	0.486 0	0.021 0	0.180 0
MLAN	0.116 5	0.719 4	5.44E-04	1.31E-03
GBS	0.067 9	0.843 8	4.44E-04	0.026 6
GMC	0.108 1	0.863 8	2.98E-04	0.023 1
RMCGL	0.020 7	0.033 5	1.65E-04	3.17E-04

Fig. 2 Survival analysis curve of RMCGL algorithm on GBM and Melanoma datasets

Fig. 3 Heat maps of the consistency graph G learned by RMCGL on NGs and two Gaussian datasets

Tab. 8 ACC results of ablation experiments

算法	NGs	Caltech101-20	MSRC	COIL20
RC	66.80	58.51	55.71	80.42
RL	96.80	57.67	57.62	80.97
RMCGL	98.40	60.90	78.57	83.75

Fig. 4 ACC results of different datasets at fixed parameters after removing error regularization term

Fig. 5 Convergence curves of RMCGL algorithm on 100leaves and BBC datasets

Tab. 9 Analysis of NMI value of NGs dataset with parameter variation

$λ A$	$λ C$
$λ A$	0.01	0.1	1	10	100
0.01	86.73	88.92	90.05	90.96	92.18
0.1	90.55	64.52	91.48	90.96	91.65
1	90.26	92.52	91.26	92.70	91.43
10	92.13	90.50	91.65	92.87	90.43
100	92.51	92.05	90.63	94.44	91.90

Tab. 9 Analysis of NMI value of NGs dataset with parameter variation

$λ A$	$λ C$
$λ A$	0.01	0.1	1	10	100
0.01	86.73	88.92	90.05	90.96	92.18
0.1	90.55	64.52	91.48	90.96	91.65
1	90.26	92.52	91.26	92.70	91.43
10	92.13	90.50	91.65	92.87	90.43
100	92.51	92.05	90.63	94.44	91.90

Tab. 10 Analysis of NMI value of COIL20 dataset with parameter variation

$λ A$	$λ C$
$λ A$	0.01	0.1	1	10	100
0.01	93.11	92.98	92.94	93.06	94.65
0.1	92.95	94.64	92.94	93.06	93.60
1	94.41	94.45	93.55	93.10	92.55
10	94.62	94.64	92.95	94.65	93.27
100	92.58	92.94	92.50	92.95	93.24

Tab. 10 Analysis of NMI value of COIL20 dataset with parameter variation

$λ A$	$λ C$
$λ A$	0.01	0.1	1	10	100
0.01	93.11	92.98	92.94	93.06	94.65
0.1	92.95	94.64	92.94	93.06	93.60
1	94.41	94.45	93.55	93.10	92.55
10	94.62	94.64	92.95	94.65	93.27
100	92.58	92.94	92.50	92.95	93.24

Fig. 6 Parameter analysis of RMCGL algorithm on NGs and COIL20 datasets

References 31

1	YANG X S， ZHANG T Z， XU C S. Cross domain feature learning in multimedia［J］. IEEE Transactions on Multimedia， 2015， 17（1）： 64-78. 10.1109/tmm.2014.2375793
2	ZHANG L， ZHANG D. Visual understanding via multi-feature shared learning with global consistency［J］. IEEE Transactions on Multimedia， 2016， 18（2）： 247-259. 10.1109/tmm.2015.2510509
3	KANG C C， XIANG S M， LIAO S C， et al. Learning consistent feature representation for cross-modal multimedia retrieval［J］. IEEE Transactions on Multimedia， 2015， 17（3）： 370-381. 10.1109/tmm.2015.2390499
4	DING C X， TAO D C. Robust face recognition via multimodal deep face representation［J］. IEEE Transactions on Multimedia， 2015， 17（11）： 2049-2058. 10.1109/tmm.2015.2477042
5	BLUM A， MITCHELL T. Combining labeled and unlabeled data with co-training［C］// Proceedings of the 11th Annual Conference on Computational Learning Theory. New York： ACM， 1998： 92-100. 10.1145/279943.279962
6	ZHAO X R， EVANS N， DUGELAY J L. A subspace co-training framework for multi-view clustering［J］. Pattern Recognition Letters， 2014， 41： 73-82. 10.1016/j.patrec.2013.12.003
7	HUANG H C， CHUANG Y Y， CHEN C S. Multiple kernel fuzzy clustering［J］. IEEE Transactions on Fuzzy Systems， 2012， 20（1）： 120-134. 10.1109/tfuzz.2011.2170175
8	KANG Z， WEN L J， CHEN W Y， et al. Low-rank kernel learning for graph-based clustering［J］. Knowledge-Based Systems， 2019， 163： 510-517. 10.1016/j.knosys.2018.09.009
9	LIU J， CAO F Y， GAO X Z， et al. A cluster-weighted kernel k-means method for multi-view clustering［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 4860-4867. 10.1609/aaai.v34i04.5922
10	SAHA M. A graph based approach to multiview clustering［C］// Proceedings of the 2013 International Conference on Pattern Recognition and Machine Intelligence， LNCS8251. Berlin： Springer， 2013： 128-133.
11	NIE F P， LI J， LI X L. Parameter-free auto-weighted multiple graph learning： a framework for multiview clustering and semi-supervised classification［C］// Proceedings of the 25th International Joint Conference on Artificial Intelligence. New York： IJCAI.org， 2016： 1881-1887. 10.24963/ijcai.2017/357
12	GAO H C， NIE F P， LI X L， et al. Multi-view subspace clustering［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：4238-4246. 10.1109/iccv.2015.482
13	YIN Q Y， WU S， HE R， et al. Multi-view clustering via pairwise sparse subspace representation［J］. Neurocomputing， 2015， 156： 12-21. 10.1016/j.neucom.2015.01.017
14	ZHANG C Q， FU H Z， LIU S， et al. Low-rank tensor constrained multiview subspace clustering［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1582-1590. 10.1109/iccv.2015.185
15	WANG X B， GUO X J， LEI Z， et al. Exclusivity-consistency regularized multi-view subspace clustering［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：923-931. 10.1109/cvpr.2017.8
16	ELHAMIFAR E， VIDAL R. Sparse subspace clustering： algorithm， theory， and applications［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2013， 35（11）： 2765-2781. 10.1109/tpami.2013.57
17	LIU G C， LIN Z C， YAN S C， et al. Robust recovery of subspace structures by low-rank representation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2013， 35（1）： 171-184. 10.1109/tpami.2012.88
18	CHEN J H， YANG J. Robust subspace segmentation via low-rank representation［J］. IEEE Transactions on Cybernetics， 2014， 44（8）： 1432-1445. 10.1109/tcyb.2013.2286106
19	NIE F P， CAI G H， LI X L. Multi-view clustering and semi-supervised classification with adaptive neighbors［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017： 2408-2414.
20	NIE F P， WANG X Q， HUANG H. Clustering and projected clustering with adaptive neighbors［C］// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2014： 977-986. 10.1145/2623330.2623726
21	FAN K. On a theorem of Weyl concerning eigenvalues of linear transformations I［J］. Proceedings of the National Academy of Sciences of the United States of America， 1949， 35（11）： 652-655. 10.1073/pnas.35.11.652
22	LIN Z C， LIU R S， SU Z X. Linearized alternating direction method with adaptive penalty for low-rank representation［C］// Proceedings of the 24th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2011： 612-620.
23	WANG H， YANG Y， LIU B. GMC： graph-based multi-view clustering［J］. IEEE Transactions on Knowledge and Data Engineering， 2020， 32（6）： 1116-1129. 10.1109/tkde.2019.2903810
24	XU N， GUO Y Q， ZHENG X， et al. Partial multi-view subspace clustering［C］// Proceedings of the 26th ACM International Conference on Multimedia. New York： ACM， 2018： 1794-1801. 10.1145/3240508.3240679
25	XU A D， CHEN J Z， PENG H， et al. Simultaneous interrogation of cancer omics to identify subtypes with significant clinical differences［J］. Frontiers in Genetics， 2019， 10： No.236. 10.3389/fgene.2019.00236
26	WANG Y X， ZHANG Y J. Nonnegative matrix factorization： a comprehensive review［J］. IEEE Transactions on Knowledge and Data Engineering， 2013， 25（6）： 1336-1353. 10.1109/tkde.2012.51
27	WANG H， YANG Y， LIU B， et al. A study of graph-based system for multi-view clustering［J］. Knowledge-Based Systems， 2019， 163： 1009-1019. 10.1016/j.knosys.2018.10.022
28	LIU J L， WANG C， GAO J， et al. Multi-view clustering via joint nonnegative matrix factorization［C］// Proceedings of the 2013 SIAM International Conference on Data Mining. Philadelphia， PA： SIAM， 2013： 252-260. 10.1137/1.9781611972832.28
29	KUMAR A， RAI P， DAUMÉ H III. Co-regularized multi-view spectral clustering［C］// Proceedings of the 24th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2011： 1413-1421.
30	KUMAR A， DAUMÉ H III. A co-training approach for multi-view spectral clustering［C］// Proceedings of the 28th International Conference on Machine Learning. Madison， WI： Omnipress， 2011： 393-400.
31	RAPPOPORT N， SHAMIR R. Multi-omic and multi-view clustering algorithms： review and cancer benchmark［J］. Nucleic Acids Research， 2018， 46（20）： 10546-10562. 10.1093/nar/gky889

[1]	CHEN Hengheng, NI Zhiwei, ZHU Xuhui, JIN Yuanyuan, CHEN Qian. Differential privacy high-dimensional data publishing method via clustering analysis [J]. Journal of Computer Applications, 2021, 41(9): 2578-2585.
[2]	ZENG Xiangyin, ZHENG Bochuan, LIU Dan. Detection of left and right railway tracks based on deep convolutional neural network and clustering [J]. Journal of Computer Applications, 2021, 41(8): 2324-2329.
[3]	ZHU Cheng, ZHAO Xiaoqi, ZHAO Liping, JIAO Yuhong, ZHU Yafei, CHENG Jianying, ZHOU Wei, TAN Ying. Classification of functional magnetic resonance imaging data based on semi-supervised feature selection by spectral clustering [J]. Journal of Computer Applications, 2021, 41(8): 2288-2293.
[4]	WANG Jiarui, TAN Guoping, ZHOU Siyuan. Clustered wireless federated learning algorithm in high-speed internet of vehicles scenes [J]. Journal of Computer Applications, 2021, 41(6): 1546-1550.
[5]	DAI Yanran, DAI Guoqing, YUAN Yubo. Multi-face foreground extraction method based on skin color learning [J]. Journal of Computer Applications, 2021, 41(6): 1659-1666.
[6]	XIE Yu, JIANG Yu, LONG Chaoqi. Extended isolation forest algorithm based on random subspace [J]. Journal of Computer Applications, 2021, 41(6): 1679-1685.
[7]	MA Jianhong, CAO Wenbin, LIU Yuangang, XIA Shuang. Patent clustering method based on functional effect [J]. Journal of Computer Applications, 2021, 41(5): 1361-1366.
[8]	LI Guorong, YE Jimin, ZHEN Yuanting. Time series clustering based on new robust similarity measure [J]. Journal of Computer Applications, 2021, 41(5): 1343-1347.
[9]	LIN Junchao, WAN Yuan. Self-adaptive multi-measure unsupervised feature selection method with structured graph optimization [J]. Journal of Computer Applications, 2021, 41(5): 1282-1289.
[10]	WANG Zhihe, CHANG Xiaoqing, DU Hui. Adaptive affinity propagation clustering algorithm based on universal gravitation [J]. Journal of Computer Applications, 2021, 41(5): 1337-1342.
[11]	LONG Chaoqi, JIANG Yu, XIE Yu. Improved wavelet clustering algorithm based on peak grid [J]. Journal of Computer Applications, 2021, 41(4): 1122-1127.
[12]	LI Xingfeng, HUANG Yuqing, REN Zhenwen, LI Yihong. Robust multi-view clustering algorithm based on adaptive neighborhood [J]. Journal of Computer Applications, 2021, 41(4): 1093-1099.
[13]	FAN Wei, WANG Huimin, XING Yan. Auto-encoder based multi-view attributed network representation learning model [J]. Journal of Computer Applications, 2021, 41(4): 1064-1070.
[14]	ZOU Zhiwen, QIN Cheng. Method of dynamically constructing spatial topic R-tree based on k-means++ [J]. Journal of Computer Applications, 2021, 41(3): 733-737.
[15]	GUO Jia, HAN Litao, SUN Xianlong, ZHOU Lijuan. Comparative density peaks clustering algorithm with automatic determination of clustering center [J]. Journal of Computer Applications, 2021, 41(3): 738-744.

Robust multi-view subspace clustering based on consistency graph learning

基于一致图学习的鲁棒多视图子空间聚类

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 16

References 31

Related Articles 15

Recommended Articles

Metrics