基于视图互信息加权的多视图集成聚类算法

doi:10.11772/j.issn.1001-9081.2022060925

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1713-1718.DOI: 10.11772/j.issn.1001-9081.2022060925

所属专题： CCF第37届中国计算机应用大会 (CCF NCCA 2022)

• CCF第37届中国计算机应用大会 (CCF NCCA 2022) • 上一篇下一篇

基于视图互信息加权的多视图集成聚类算法

劳景欢¹, 黄栋¹(), 王昌栋², 赖剑煌²

^1.华南农业大学数学与信息学院，广州 510642
^2.中山大学计算机学院，广州 510006

收稿日期:2022-06-27 修回日期:2022-10-18 接受日期:2022-10-20 发布日期:2022-12-02 出版日期:2023-06-10
通讯作者: 黄栋
作者简介:劳景欢（1996—），女，广东湛江人，硕士研究生，CCF会员，主要研究方向：多视图聚类、大规模聚类
黄栋（1987—），男，广东河源人，副教授，博士，CCF会员，主要研究方向：数据挖掘、机器学习Email：huangdonghere@gmail.com
王昌栋（1984—），男，广东河源人，副教授，博士生导师，博士，CCF会员，主要研究方向：数据挖掘、机器学习
赖剑煌（1964—），男，广东普宁人，教授，博士生导师，博士，CCF杰出会员，主要研究方向：生物特征识别、数字图像处理、模式识别、机器学习。
基金资助:
国家自然科学基金资助项目(61976097);广东省自然科学基金资助项目(2021A1515012203)

Multi-view ensemble clustering algorithm based on view-wise mutual information weighting

Jinghuan LAO¹, Dong HUANG¹(), Changdong WANG², Jianhuang LAI²

^1.College of Mathematics and Informatics，South China Agricultural University，Guangzhou Guangdong 510642，China
^2.School of Computer Science and Engineering，Sun Yat?sen University，Guangzhou Guangdong 510006，China

Received:2022-06-27 Revised:2022-10-18 Accepted:2022-10-20 Online:2022-12-02 Published:2023-06-10
Contact: Dong HUANG
About author:LAO Jinghuan， born in 1996， M. S. candidate. Her research interests include multi-view clustering， large-scale clustering.
WANG Changdong， born in 1984， Ph. D.， associate professor. His research interests include data mining， machine learning.
LAI Jianhuang， born in 1964， Ph. D.， professor. His research interests include biometrics， digital image processing， pattern recognition， machine learning.
Supported by:
National Natural Science Foundation of China(61976097);Natural Science Foundation of Guangdong Province(2021A1515012203)

摘要/Abstract

摘要：

现有的多视图聚类算法往往缺乏对各视图可靠度的评估和对视图进行加权的能力，而一些具备视图加权的多视图聚类算法则通常依赖于特定目标函数的迭代优化，其目标函数的适用性及部分敏感超参数调优的合理性均对实际应用有显著影响。针对这些问题，提出一种基于视图互信息加权的多视图集成聚类（MEC-VMIW）算法，主要过程可分为两个阶段，即视图互加权阶段与多视图集成聚类阶段。在视图互信息加权阶段，对数据集进行多次随机降采样，以降低评估加权过程的问题规模，进而构建多视图降采样聚类集合，根据不同视图的聚类结果之间的多轮互评得到视图可靠度评估，并据此对视图进行加权；在多视图集成聚类阶段，对各个视图数据构建基聚类集合，并将多个基聚类集合加权建模至二部图结构，利用高效二部图分割算法得到最终多视图聚类结果。在若干个多视图数据集上的实验结果验证了所提出的多视图集成聚类算法的鲁棒聚类性能。

关键词: 数据聚类, 多视图聚类, 互信息, 集成聚类, 视图加权, 二部图

Abstract:

Many of the existing multi-view clustering algorithms lack the ability to estimate the reliability of different views and thus weight the views accordingly， and some multi-view clustering algorithms with view-weighting ability generally rely on the iterative optimization of specific objective function， whose real-world applications may be significantly influenced by the practicality of the objective function and the rationality of tuning some sensitive hyperparameters. To address these problems， a Multi-view Ensemble Clustering algorithm based on View-wise Mutual Information Weighting （MEC-VMIW） was proposed， whose overall process consists of two phases： the view-wise mutual weighting phase and the multi-view ensemble clustering phase. In the view-wise mutual weighting phase， multiple random down-samplings were performed to the dataset， so as to reduce the problem size in the evaluating and weighting process. After that， a set of down-sampled clusterings of multiple views was constructed. And， based on multiple runs of mutual evaluation among the clustering results of different views， the view-wise reliability was estimated and used for view weighting. In the multi-view ensemble clustering phase， the ensemble of base clusterings was constructed for each view， and multiple base clustering sets were weighted to model a bipartite graph structure. By performing efficient bipartite graph partitioning， the final multi-view clustering results were obtained. Experiments on several multi-view datasets confirm the robust clustering performance of the proposed multi-view ensemble clustering algorithm.

Key words: data clustering, multi-view clustering, mutual information, ensemble clustering, view weighting, bipartite graph

中图分类号:

TP391.1

劳景欢, 黄栋, 王昌栋, 赖剑煌. 基于视图互信息加权的多视图集成聚类算法[J]. 计算机应用, 2023, 43(6): 1713-1718.

Jinghuan LAO, Dong HUANG, Changdong WANG, Jianhuang LAI. Multi-view ensemble clustering algorithm based on view-wise mutual information weighting[J]. Journal of Computer Applications, 2023, 43(6): 1713-1718.

图/表 8

参考文献 34

1	JAIN A K. Data clustering： 50 years beyond K-means［J］. Pattern Recognition Letters， 2010， 31（8）： 651-666. 10.1016/j.patrec.2009.09.011
2	von LUXBURG U. A tutorial on spectral clustering［J］. Statistics and Computing， 2007， 17（4）： 395-416. 10.1007/s11222-007-9033-z
3	WANG C D， LAI J H， HUANG D， et al. SVStream： a support vector-based algorithm for clustering data streams［J］. IEEE Transactions on Knowledge and Data Engineering， 2013， 25（6）： 1410-1424. 10.1109/tkde.2011.263
4	CAI X S， HUANG D， WANG C D， et al. Spectral clustering by subspace randomization and graph fusion for high-dimensional data［C］// Proceedings of the 2020 Pacific-Asia Conference on Knowledge Discovery and Data Mining， LNCS 12084. Cham： Springer， 2020： 330-342.
5	LIANG Y W， HUANG D， WANG C D. Consistency meets inconsistency： a unified graph learning framework for multi-view clustering［C］// Proceedings of the 2019 IEEE International Conference on Data Mining. Piscataway： IEEE， 2019： 1204-1209. 10.1109/icdm.2019.00148
6	CHEN M S， HUANG L， WANG C D， et al. Multi-view clustering in latent embedding space［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 3513-3520. 10.1609/aaai.v34i04.5756
7	ZHANG G Y， ZHOU Y R， WANG C D， et al. Joint representation learning for multi-view subspace clustering ［J］. Expert Systems with Applications， 2021， 166： No.113913. 10.1016/j.eswa.2020.113913
8	LIU J L， WANG C， GAO J， et al. Multi-view clustering via joint nonnegative matrix factorization ［C］// Proceedings of the 2013 SIAM International Conference on Data Mining. Philadelphia， PA： SIAM， 2013： 252-260. 10.1137/1.9781611972832.28
9	NIE F P， TIAN L， LI X L. Multiview clustering via adaptively weighted Procrustes［C］// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2018： 2022-2030. 10.1145/3219819.3220049
10	WANG R， NIE F P， WANG Z， et al. Parameter-free weighted multi-view projected clustering with structured graph learning［J］. IEEE Transactions on Knowledge and Data Engineering， 2020， 32（10）： 2014-2025. 10.1109/tkde.2019.2913377
11	KHAN A， MAJI P. Multi-manifold optimization for multi-view subspace clustering［J］. IEEE Transactions on Neural Networks and Learning Systems， 2022， 33（8）： 3895-3907. 10.1109/tnnls.2021.3054789
12	TANG C， LIU X W， ZHU X Z， et al. CGD： multi-view clustering via cross-view graph diffusion ［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 5924-5931. 10.1609/aaai.v34i04.6052
13	夏冬雪，杨燕，王浩，等. 基于邻域多核学习的后融合多视图聚类算法［J］. 计算机研究与发展， 2020， 57（8）：1627-1638. 10.7544/issn1000-1239.2020.20200212
	XIA D X， YANG Y， WANG H， et al. Late fusion multi-view clustering based on local multi-kernel learning［J］. Journal of Computer Research and Development， 2020， 57（8）：1627-1638. 10.7544/issn1000-1239.2020.20200212
14	赵博宇，张长青，陈蕾，等. 生成式不完整多视图数据聚类［J］. 自动化学报， 2021， 47（8）：1867-1875. 10.16383/j.aas.c200121
	ZHAO B Y， ZHANG C Q， CHEN L， et al. Generative model for partial multi-view clustering［J］. Acta Automatica Sinica， 2021， 47（8）：1867-1875. 10.16383/j.aas.c200121
15	ZHANG G Y， ZHOU Y R， HE X Y， et al. One-step kernel multi-view subspace clustering［J］. Knowledge-Based Systems， 2020， 189： No.105126. 10.1016/j.knosys.2019.105126
16	CHEN M S， HUANG L， WANG C D， et al. Multiview subspace clustering with grouping effect［J］. IEEE Transactions on Cybernetics， 2022， 52（8）： 7655-7668. 10.1109/tcyb.2020.3035043
17	LIU J Y， LIU X W， YANG Y X， et al. Multiview subspace clustering via co-training robust data representation［J］. IEEE Transactions on Neural Networks and Learning Systems， 2022， 33（10）： 5177-5189. 10.1109/tnnls.2021.3069424
18	ZHANG P， LIU X W， XIONG J， et al. Consensus one-step multi-view subspace clustering［J］. IEEE Transactions on Knowledge and Data Engineering， 2022， 34（10）： 4676-4689. 10.1109/tkde.2020.3045770
19	ZHANG G Y， WANG C D， HUANG D， et al. TW-Co-k-means： two-level weighted collaborative k-means for multi-view clustering［J］. Knowledge-Based Systems， 2018， 150： 127-138. 10.1016/j.knosys.2018.03.009
20	WANG H， YANG Y， LIU B. GMC： graph-based multi-view clustering［J］. IEEE Transactions on Knowledge and Data Engineering， 2020， 32（6）： 1116-1129. 10.1109/tkde.2019.2903810
21	CHEN M S， HUANG L， WANG C D， et al. Multi-view spectral clustering via multi-view weighted consensus and matrix-decomposition based discretization［C］// Proceedings of the 2019 International Conference on Database Systems for Advanced Applications， LNCS 11446. Cham： Springer， 2019： 175-190.
22	HUANG S D， KANG Z， TSANG I W， et al. Auto-weighted multi-view clustering via kernelized graph learning［J］. Pattern Recognition， 2019， 88： 174-184. 10.1016/j.patcog.2018.11.007
23	REN P Z， XIAO Y， XU P F， et al. Robust auto-weighted multi-view clustering［C］// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2018： 2644-2650. 10.24963/ijcai.2018/367
24	HUANG D， LAI J H， WANG C D. Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis［J］. Neurocomputing， 2015， 170： 240-250. 10.1016/j.neucom.2014.05.094
25	HUANG D， LAI J H， WANG C D. Robust ensemble clustering using probability trajectories［J］. IEEE Transactions on Knowledge and Data Engineering， 2016， 28（5）： 1312-1326. 10.1109/tkde.2015.2503753
26	HUANG D， LAI J H， WANG C D. Ensemble clustering using factor graph ［J］. Pattern Recognition， 2016， 50： 131-142. 10.1016/j.patcog.2015.08.015
27	HUANG D， WANG C D， LAI J H. Locally weighted ensemble clustering ［J］. IEEE Transactions on Cybernetics， 2018， 48（5）： 1460-1473. 10.1109/tcyb.2017.2702343
28	HUANG D， WANG C D， WU J S， et al. Ultra-scalable spectral clustering and ensemble clustering［J］. IEEE Transactions on Knowledge and Data Engineering， 2020， 32（6）： 1212-1226. 10.1109/tkde.2019.2903410
29	HUANG D， WANG C D， PENG H X， et al. Enhanced ensemble clustering via fast propagation of cluster-wise similarities［J］. IEEE Transactions on Systems， Man， and Cybernetics： Systems， 2021， 51（1）： 508-520. 10.1109/tsmc.2018.2876202
30	HUANG D， WANG C D， LAI J H， et al. Toward multi-diversified ensemble clustering of high-dimensional data： from subspaces to metrics and beyond ［J］. IEEE Transactions on Cybernetics， 2022， 52（11）： 12231-12244. 10.1109/tcyb.2021.3049633
31	STREHL A， GHOSH J. Cluster ensembles — a knowledge reuse framework for combining multiple partitions［J］. Journal of Machine Learning Research， 2002， 3： 583-617.
32	LI Z G， WU X M， CHANG S F. Segmentation using superpixels： a bipartite graph partitioning approach［C］// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2012： 789-796. 10.1109/cvpr.2012.6247750
33	XIA R K， PAN Y， DU L， et al. Robust multi-view spectral clustering via low-rank and sparse decomposition ［C］// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2014： 2149-2155. 10.1609/aaai.v28i1.8950
34	ZHAN K， NIE F P， WANG J， et al. Multiview consensus graph clustering［J］. IEEE Transactions on Image Processing， 2019， 28（3）： 1261-1270. 10.1109/tip.2018.2877335

数据集	视图数	样本数	类别数
3Sources	3	169	6
Notting-Hill	3	550	5
Reuters	5	1 200	6
Mfeat	3	1 200	10
Caltech-7	3	1 474	7
Caltech-20	3	2 386	20

数据集	视图数	样本数	类别数
3Sources	3	169	6
Notting-Hill	3	550	5
Reuters	5	1 200	6
Mfeat	3	1 200	10
Caltech-7	3	1 474	7
Caltech-20	3	2 386	20

数据集	SC-Avg	SC-Best	RMSC	AWP	MCGC	CoMSC	MEC-VMIW
平均得分	46.95±0.22	51.77±0.33	28.23±1.21	54.16±0.00	26.56±0.00	47.33±0.22	60.80±3.37
3Sources	45.41±0.48	50.01±0.76	60.31±1.12	62.88±0.00	58.72±0.00	46.74±0.51	68.83±2.52
Notting-Hill	66.80±0.12	72.02±0.00	40.47±2.09	60.63±0.00	0.85±0.00	54.82±0.00	80.64±5.77
Reuters	25.62±0.11	27.00±0.21	10.60±0.37	11.40±0.00	9.53±0.00	13.21±0.05	23.55±4.90
Mfeat	57.56±0.10	64.89±0.18	23.27±0.71	78.26±0.00	59.38±0.00	70.52±0.17	81.92±3.02
Caltech-7	36.95±0.05	40.65±0.01	10.95±0.53	52.47±0.00	0.20±0.00	42.70±0.07	52.80±2.81
Caltech-20	49.37±0.45	56.04±0.80	23.79±2.43	59.31±0.00	30.66±0.00	56.00±0.55	57.05±1.21

数据集	SC-Avg	SC-Best	RMSC	AWP	MCGC	CoMSC	MEC-VMIW
平均得分	46.95±0.22	51.77±0.33	28.23±1.21	54.16±0.00	26.56±0.00	47.33±0.22	60.80±3.37
3Sources	45.41±0.48	50.01±0.76	60.31±1.12	62.88±0.00	58.72±0.00	46.74±0.51	68.83±2.52
Notting-Hill	66.80±0.12	72.02±0.00	40.47±2.09	60.63±0.00	0.85±0.00	54.82±0.00	80.64±5.77
Reuters	25.62±0.11	27.00±0.21	10.60±0.37	11.40±0.00	9.53±0.00	13.21±0.05	23.55±4.90
Mfeat	57.56±0.10	64.89±0.18	23.27±0.71	78.26±0.00	59.38±0.00	70.52±0.17	81.92±3.02
Caltech-7	36.95±0.05	40.65±0.01	10.95±0.53	52.47±0.00	0.20±0.00	42.70±0.07	52.80±2.81
Caltech-20	49.37±0.45	56.04±0.80	23.79±2.43	59.31±0.00	30.66±0.00	56.00±0.55	57.05±1.21

数据集	SC-Avg	SC-Best	RMSC	AWP	MCGC	CoMSC	MEC-VMIW
平均得分	35.76±0.30	41.54±0.48	13.25±0.98	45.20±0.00	20.50±0.00	37.01±0.63	53.61±5.21
3Sources	29.62±0.55	35.44±0.86	45.26±1.35	51.84±0.00	50.62±0.00	45.75±1.00	66.13±2.44
Notting-Hill	63.44±0.14	73.52±0.00	26.22±2.31	56.00±0.00	0.22±0.00	40.52±0.00	75.06±11.14
Reuters	19.37±0.10	20.46±0.12	1.97±0.16	2.42±0.00	1.64±0.00	4.90±0.03	16.99±5.98
Mfeat	45.18±0.14	55.14±0.25	3.94±0.56	68.71±0.00	49.75±0.00	60.04±0.17	74.76±5.32
Caltech-7	28.78±0.05	30.11±0.05	-0.12±0.12	50.50±0.00	-0.41±0.00	32.26±0.16	49.56±4.51
Caltech-20	28.17±0.83	34.56±1.60	2.23±1.38	41.71±0.00	21.22±0.00	38.57±2.41	39.18±1.89

基于视图互信息加权的多视图集成聚类算法

Multi-view ensemble clustering algorithm based on view-wise mutual information weighting

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 34

相关文章 15

编辑推荐

Metrics

[1]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[2]	董瑶, 付怡雪, 董永峰, 史进, 陈晨. 不完整多视图聚类综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1673-1682.
[3]	杨成昊, 胡节, 王红军, 彭博. 基于注意力机制的不完备多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3784-3789.
[4]	朱云华, 孔兵, 周丽华, 陈红梅, 包崇明. 图对比学习引导的多视图聚类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3267-3274.
[5]	陈都, 李圆媛, 陈彧. 基于t检验和逐步网络搜索的有向基因调控网络推断算法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 199-205.
[6]	李瀚臣, 张顺香, 朱广丽, 王腾科. 基于拼音相似度的中文谐音新词发现方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2715-2720.
[7]	何子仪, 杨燕, 张熠玲. 深度融合多视图聚类网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2651-2656.
[8]	孙林, 黄金旭, 徐久成. 基于邻域容差互信息和鲸鱼优化算法的非平衡数据特征选择[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1842-1854.
[9]	夏进, 王正群, 朱世明. 基于时间序列分解的交通流量预测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1129-1135.
[10]	杨世刚, 刘勇国. 融合语料库特征与图注意力网络的短文本分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1324-1329.
[11]	陈育丹, 高翠芳, 沈莞蔷, 殷萍. 迭代直觉模糊K-modes算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 375-381.
[12]	翟东昌, 陈红梅. 基于邻域熵的高光谱波段选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 485-492.
[13]	管娇娇, 钱雪忠, 周世兵, 姜凯彬, 宋威. 基于格拉斯曼流形子空间融合的多视图聚类[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3740-3749.
[14]	董永峰, 屈向前, 李林昊, 董瑶. 基于作者偏好的学术投稿刊物推荐算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 50-56.
[15]	陈永波, 李巧勤, 刘勇国. 基于动态相关性的特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 109-114.