《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1713-1718.DOI: 10.11772/j.issn.1001-9081.2022060925
所属专题: CCF第37届中国计算机应用大会 (CCF NCCA 2022)
• CCF第37届中国计算机应用大会 (CCF NCCA 2022) • 上一篇 下一篇
收稿日期:
2022-06-27
修回日期:
2022-10-18
接受日期:
2022-10-20
发布日期:
2022-12-02
出版日期:
2023-06-10
通讯作者:
黄栋
作者简介:
劳景欢(1996—),女,广东湛江人,硕士研究生,CCF会员,主要研究方向:多视图聚类、大规模聚类基金资助:
Jinghuan LAO1, Dong HUANG1(), Changdong WANG2, Jianhuang LAI2
Received:
2022-06-27
Revised:
2022-10-18
Accepted:
2022-10-20
Online:
2022-12-02
Published:
2023-06-10
Contact:
Dong HUANG
About author:
LAO Jinghuan, born in 1996, M. S. candidate. Her research interests include multi-view clustering, large-scale clustering.Supported by:
摘要:
现有的多视图聚类算法往往缺乏对各视图可靠度的评估和对视图进行加权的能力,而一些具备视图加权的多视图聚类算法则通常依赖于特定目标函数的迭代优化,其目标函数的适用性及部分敏感超参数调优的合理性均对实际应用有显著影响。针对这些问题,提出一种基于视图互信息加权的多视图集成聚类(MEC-VMIW)算法,主要过程可分为两个阶段,即视图互加权阶段与多视图集成聚类阶段。在视图互信息加权阶段,对数据集进行多次随机降采样,以降低评估加权过程的问题规模,进而构建多视图降采样聚类集合,根据不同视图的聚类结果之间的多轮互评得到视图可靠度评估,并据此对视图进行加权;在多视图集成聚类阶段,对各个视图数据构建基聚类集合,并将多个基聚类集合加权建模至二部图结构,利用高效二部图分割算法得到最终多视图聚类结果。在若干个多视图数据集上的实验结果验证了所提出的多视图集成聚类算法的鲁棒聚类性能。
中图分类号:
劳景欢, 黄栋, 王昌栋, 赖剑煌. 基于视图互信息加权的多视图集成聚类算法[J]. 计算机应用, 2023, 43(6): 1713-1718.
Jinghuan LAO, Dong HUANG, Changdong WANG, Jianhuang LAI. Multi-view ensemble clustering algorithm based on view-wise mutual information weighting[J]. Journal of Computer Applications, 2023, 43(6): 1713-1718.
数据集 | 视图数 | 样本数 | 类别数 |
---|---|---|---|
3Sources | 3 | 169 | 6 |
Notting-Hill | 3 | 550 | 5 |
Reuters | 5 | 1 200 | 6 |
Mfeat | 3 | 1 200 | 10 |
Caltech-7 | 3 | 1 474 | 7 |
Caltech-20 | 3 | 2 386 | 20 |
表1 实验数据集
Tab.1 Experimental datasets
数据集 | 视图数 | 样本数 | 类别数 |
---|---|---|---|
3Sources | 3 | 169 | 6 |
Notting-Hill | 3 | 550 | 5 |
Reuters | 5 | 1 200 | 6 |
Mfeat | 3 | 1 200 | 10 |
Caltech-7 | 3 | 1 474 | 7 |
Caltech-20 | 3 | 2 386 | 20 |
数据集 | SC-Avg | SC-Best | RMSC | AWP | MCGC | CoMSC | MEC-VMIW |
---|---|---|---|---|---|---|---|
平均得分 | 46.95±0.22 | 51.77±0.33 | 28.23±1.21 | 54.16±0.00 | 26.56±0.00 | 47.33±0.22 | 60.80±3.37 |
3Sources | 45.41±0.48 | 50.01±0.76 | 60.31±1.12 | 62.88±0.00 | 58.72±0.00 | 46.74±0.51 | 68.83±2.52 |
Notting-Hill | 66.80±0.12 | 72.02±0.00 | 40.47±2.09 | 60.63±0.00 | 0.85±0.00 | 54.82±0.00 | 80.64±5.77 |
Reuters | 25.62±0.11 | 27.00±0.21 | 10.60±0.37 | 11.40±0.00 | 9.53±0.00 | 13.21±0.05 | 23.55±4.90 |
Mfeat | 57.56±0.10 | 64.89±0.18 | 23.27±0.71 | 78.26±0.00 | 59.38±0.00 | 70.52±0.17 | 81.92±3.02 |
Caltech-7 | 36.95±0.05 | 40.65±0.01 | 10.95±0.53 | 52.47±0.00 | 0.20±0.00 | 42.70±0.07 | 52.80±2.81 |
Caltech-20 | 49.37±0.45 | 56.04±0.80 | 23.79±2.43 | 59.31±0.00 | 30.66±0.00 | 56.00±0.55 | 57.05±1.21 |
表2 本文算法与其他聚类算法的NMI得分 (%)
Tab.2 NMI scores of the proposed algorithm and other clustering algorithms
数据集 | SC-Avg | SC-Best | RMSC | AWP | MCGC | CoMSC | MEC-VMIW |
---|---|---|---|---|---|---|---|
平均得分 | 46.95±0.22 | 51.77±0.33 | 28.23±1.21 | 54.16±0.00 | 26.56±0.00 | 47.33±0.22 | 60.80±3.37 |
3Sources | 45.41±0.48 | 50.01±0.76 | 60.31±1.12 | 62.88±0.00 | 58.72±0.00 | 46.74±0.51 | 68.83±2.52 |
Notting-Hill | 66.80±0.12 | 72.02±0.00 | 40.47±2.09 | 60.63±0.00 | 0.85±0.00 | 54.82±0.00 | 80.64±5.77 |
Reuters | 25.62±0.11 | 27.00±0.21 | 10.60±0.37 | 11.40±0.00 | 9.53±0.00 | 13.21±0.05 | 23.55±4.90 |
Mfeat | 57.56±0.10 | 64.89±0.18 | 23.27±0.71 | 78.26±0.00 | 59.38±0.00 | 70.52±0.17 | 81.92±3.02 |
Caltech-7 | 36.95±0.05 | 40.65±0.01 | 10.95±0.53 | 52.47±0.00 | 0.20±0.00 | 42.70±0.07 | 52.80±2.81 |
Caltech-20 | 49.37±0.45 | 56.04±0.80 | 23.79±2.43 | 59.31±0.00 | 30.66±0.00 | 56.00±0.55 | 57.05±1.21 |
数据集 | SC-Avg | SC-Best | RMSC | AWP | MCGC | CoMSC | MEC-VMIW |
---|---|---|---|---|---|---|---|
平均得分 | 35.76±0.30 | 41.54±0.48 | 13.25±0.98 | 45.20±0.00 | 20.50±0.00 | 37.01±0.63 | 53.61±5.21 |
3Sources | 29.62±0.55 | 35.44±0.86 | 45.26±1.35 | 51.84±0.00 | 50.62±0.00 | 45.75±1.00 | 66.13±2.44 |
Notting-Hill | 63.44±0.14 | 73.52±0.00 | 26.22±2.31 | 56.00±0.00 | 0.22±0.00 | 40.52±0.00 | 75.06±11.14 |
Reuters | 19.37±0.10 | 20.46±0.12 | 1.97±0.16 | 2.42±0.00 | 1.64±0.00 | 4.90±0.03 | 16.99±5.98 |
Mfeat | 45.18±0.14 | 55.14±0.25 | 3.94±0.56 | 68.71±0.00 | 49.75±0.00 | 60.04±0.17 | 74.76±5.32 |
Caltech-7 | 28.78±0.05 | 30.11±0.05 | -0.12±0.12 | 50.50±0.00 | -0.41±0.00 | 32.26±0.16 | 49.56±4.51 |
Caltech-20 | 28.17±0.83 | 34.56±1.60 | 2.23±1.38 | 41.71±0.00 | 21.22±0.00 | 38.57±2.41 | 39.18±1.89 |
表3 本文算法与其他聚类算法的ARI得分 (%)
Tab.3 ARI scores of the proposed algorithm and other clustering algorithms
数据集 | SC-Avg | SC-Best | RMSC | AWP | MCGC | CoMSC | MEC-VMIW |
---|---|---|---|---|---|---|---|
平均得分 | 35.76±0.30 | 41.54±0.48 | 13.25±0.98 | 45.20±0.00 | 20.50±0.00 | 37.01±0.63 | 53.61±5.21 |
3Sources | 29.62±0.55 | 35.44±0.86 | 45.26±1.35 | 51.84±0.00 | 50.62±0.00 | 45.75±1.00 | 66.13±2.44 |
Notting-Hill | 63.44±0.14 | 73.52±0.00 | 26.22±2.31 | 56.00±0.00 | 0.22±0.00 | 40.52±0.00 | 75.06±11.14 |
Reuters | 19.37±0.10 | 20.46±0.12 | 1.97±0.16 | 2.42±0.00 | 1.64±0.00 | 4.90±0.03 | 16.99±5.98 |
Mfeat | 45.18±0.14 | 55.14±0.25 | 3.94±0.56 | 68.71±0.00 | 49.75±0.00 | 60.04±0.17 | 74.76±5.32 |
Caltech-7 | 28.78±0.05 | 30.11±0.05 | -0.12±0.12 | 50.50±0.00 | -0.41±0.00 | 32.26±0.16 | 49.56±4.51 |
Caltech-20 | 28.17±0.83 | 34.56±1.60 | 2.23±1.38 | 41.71±0.00 | 21.22±0.00 | 38.57±2.41 | 39.18±1.89 |
1 | JAIN A K. Data clustering: 50 years beyond K-means[J]. Pattern Recognition Letters, 2010, 31(8): 651-666. 10.1016/j.patrec.2009.09.011 |
2 | von LUXBURG U. A tutorial on spectral clustering[J]. Statistics and Computing, 2007, 17(4): 395-416. 10.1007/s11222-007-9033-z |
3 | WANG C D, LAI J H, HUANG D, et al. SVStream: a support vector-based algorithm for clustering data streams[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(6): 1410-1424. 10.1109/tkde.2011.263 |
4 | CAI X S, HUANG D, WANG C D, et al. Spectral clustering by subspace randomization and graph fusion for high-dimensional data[C]// Proceedings of the 2020 Pacific-Asia Conference on Knowledge Discovery and Data Mining, LNCS 12084. Cham: Springer, 2020: 330-342. |
5 | LIANG Y W, HUANG D, WANG C D. Consistency meets inconsistency: a unified graph learning framework for multi-view clustering[C]// Proceedings of the 2019 IEEE International Conference on Data Mining. Piscataway: IEEE, 2019: 1204-1209. 10.1109/icdm.2019.00148 |
6 | CHEN M S, HUANG L, WANG C D, et al. Multi-view clustering in latent embedding space[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 3513-3520. 10.1609/aaai.v34i04.5756 |
7 | ZHANG G Y, ZHOU Y R, WANG C D, et al. Joint representation learning for multi-view subspace clustering [J]. Expert Systems with Applications, 2021, 166: No.113913. 10.1016/j.eswa.2020.113913 |
8 | LIU J L, WANG C, GAO J, et al. Multi-view clustering via joint nonnegative matrix factorization [C]// Proceedings of the 2013 SIAM International Conference on Data Mining. Philadelphia, PA: SIAM, 2013: 252-260. 10.1137/1.9781611972832.28 |
9 | NIE F P, TIAN L, LI X L. Multiview clustering via adaptively weighted Procrustes[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2018: 2022-2030. 10.1145/3219819.3220049 |
10 | WANG R, NIE F P, WANG Z, et al. Parameter-free weighted multi-view projected clustering with structured graph learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(10): 2014-2025. 10.1109/tkde.2019.2913377 |
11 | KHAN A, MAJI P. Multi-manifold optimization for multi-view subspace clustering[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(8): 3895-3907. 10.1109/tnnls.2021.3054789 |
12 | TANG C, LIU X W, ZHU X Z, et al. CGD: multi-view clustering via cross-view graph diffusion [C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 5924-5931. 10.1609/aaai.v34i04.6052 |
13 | 夏冬雪,杨燕,王浩,等. 基于邻域多核学习的后融合多视图聚类算法[J]. 计算机研究与发展, 2020, 57(8):1627-1638. 10.7544/issn1000-1239.2020.20200212 |
XIA D X, YANG Y, WANG H, et al. Late fusion multi-view clustering based on local multi-kernel learning[J]. Journal of Computer Research and Development, 2020, 57(8):1627-1638. 10.7544/issn1000-1239.2020.20200212 | |
14 | 赵博宇,张长青,陈蕾,等. 生成式不完整多视图数据聚类[J]. 自动化学报, 2021, 47(8):1867-1875. 10.16383/j.aas.c200121 |
ZHAO B Y, ZHANG C Q, CHEN L, et al. Generative model for partial multi-view clustering[J]. Acta Automatica Sinica, 2021, 47(8):1867-1875. 10.16383/j.aas.c200121 | |
15 | ZHANG G Y, ZHOU Y R, HE X Y, et al. One-step kernel multi-view subspace clustering[J]. Knowledge-Based Systems, 2020, 189: No.105126. 10.1016/j.knosys.2019.105126 |
16 | CHEN M S, HUANG L, WANG C D, et al. Multiview subspace clustering with grouping effect[J]. IEEE Transactions on Cybernetics, 2022, 52(8): 7655-7668. 10.1109/tcyb.2020.3035043 |
17 | LIU J Y, LIU X W, YANG Y X, et al. Multiview subspace clustering via co-training robust data representation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(10): 5177-5189. 10.1109/tnnls.2021.3069424 |
18 | ZHANG P, LIU X W, XIONG J, et al. Consensus one-step multi-view subspace clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(10): 4676-4689. 10.1109/tkde.2020.3045770 |
19 | ZHANG G Y, WANG C D, HUANG D, et al. TW-Co-k-means: two-level weighted collaborative k-means for multi-view clustering[J]. Knowledge-Based Systems, 2018, 150: 127-138. 10.1016/j.knosys.2018.03.009 |
20 | WANG H, YANG Y, LIU B. GMC: graph-based multi-view clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(6): 1116-1129. 10.1109/tkde.2019.2903810 |
21 | CHEN M S, HUANG L, WANG C D, et al. Multi-view spectral clustering via multi-view weighted consensus and matrix-decomposition based discretization[C]// Proceedings of the 2019 International Conference on Database Systems for Advanced Applications, LNCS 11446. Cham: Springer, 2019: 175-190. |
22 | HUANG S D, KANG Z, TSANG I W, et al. Auto-weighted multi-view clustering via kernelized graph learning[J]. Pattern Recognition, 2019, 88: 174-184. 10.1016/j.patcog.2018.11.007 |
23 | REN P Z, XIAO Y, XU P F, et al. Robust auto-weighted multi-view clustering[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2018: 2644-2650. 10.24963/ijcai.2018/367 |
24 | HUANG D, LAI J H, WANG C D. Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis[J]. Neurocomputing, 2015, 170: 240-250. 10.1016/j.neucom.2014.05.094 |
25 | HUANG D, LAI J H, WANG C D. Robust ensemble clustering using probability trajectories[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(5): 1312-1326. 10.1109/tkde.2015.2503753 |
26 | HUANG D, LAI J H, WANG C D. Ensemble clustering using factor graph [J]. Pattern Recognition, 2016, 50: 131-142. 10.1016/j.patcog.2015.08.015 |
27 | HUANG D, WANG C D, LAI J H. Locally weighted ensemble clustering [J]. IEEE Transactions on Cybernetics, 2018, 48(5): 1460-1473. 10.1109/tcyb.2017.2702343 |
28 | HUANG D, WANG C D, WU J S, et al. Ultra-scalable spectral clustering and ensemble clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(6): 1212-1226. 10.1109/tkde.2019.2903410 |
29 | HUANG D, WANG C D, PENG H X, et al. Enhanced ensemble clustering via fast propagation of cluster-wise similarities[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(1): 508-520. 10.1109/tsmc.2018.2876202 |
30 | HUANG D, WANG C D, LAI J H, et al. Toward multi-diversified ensemble clustering of high-dimensional data: from subspaces to metrics and beyond [J]. IEEE Transactions on Cybernetics, 2022, 52(11): 12231-12244. 10.1109/tcyb.2021.3049633 |
31 | STREHL A, GHOSH J. Cluster ensembles — a knowledge reuse framework for combining multiple partitions[J]. Journal of Machine Learning Research, 2002, 3: 583-617. |
32 | LI Z G, WU X M, CHANG S F. Segmentation using superpixels: a bipartite graph partitioning approach[C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012: 789-796. 10.1109/cvpr.2012.6247750 |
33 | XIA R K, PAN Y, DU L, et al. Robust multi-view spectral clustering via low-rank and sparse decomposition [C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2014: 2149-2155. 10.1609/aaai.v28i1.8950 |
34 | ZHAN K, NIE F P, WANG J, et al. Multiview consensus graph clustering[J]. IEEE Transactions on Image Processing, 2019, 28(3): 1261-1270. 10.1109/tip.2018.2877335 |
[1] | 李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703. |
[2] | 董瑶, 付怡雪, 董永峰, 史进, 陈晨. 不完整多视图聚类综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1673-1682. |
[3] | 杨成昊, 胡节, 王红军, 彭博. 基于注意力机制的不完备多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3784-3789. |
[4] | 朱云华, 孔兵, 周丽华, 陈红梅, 包崇明. 图对比学习引导的多视图聚类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3267-3274. |
[5] | 陈都, 李圆媛, 陈彧. 基于t检验和逐步网络搜索的有向基因调控网络推断算法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 199-205. |
[6] | 李瀚臣, 张顺香, 朱广丽, 王腾科. 基于拼音相似度的中文谐音新词发现方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2715-2720. |
[7] | 何子仪, 杨燕, 张熠玲. 深度融合多视图聚类网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2651-2656. |
[8] | 孙林, 黄金旭, 徐久成. 基于邻域容差互信息和鲸鱼优化算法的非平衡数据特征选择[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1842-1854. |
[9] | 夏进, 王正群, 朱世明. 基于时间序列分解的交通流量预测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1129-1135. |
[10] | 杨世刚, 刘勇国. 融合语料库特征与图注意力网络的短文本分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1324-1329. |
[11] | 陈育丹, 高翠芳, 沈莞蔷, 殷萍. 迭代直觉模糊K-modes算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 375-381. |
[12] | 翟东昌, 陈红梅. 基于邻域熵的高光谱波段选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 485-492. |
[13] | 管娇娇, 钱雪忠, 周世兵, 姜凯彬, 宋威. 基于格拉斯曼流形子空间融合的多视图聚类[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3740-3749. |
[14] | 董永峰, 屈向前, 李林昊, 董瑶. 基于作者偏好的学术投稿刊物推荐算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 50-56. |
[15] | 陈永波, 李巧勤, 刘勇国. 基于动态相关性的特征选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 109-114. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||