《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1713-1718.DOI: 10.11772/j.issn.1001-9081.2022060925

• CCF第37届中国计算机应用大会 (CCF NCCA 2022) • 上一篇    下一篇

基于视图互信息加权的多视图集成聚类算法

劳景欢1, 黄栋1(), 王昌栋2, 赖剑煌2   

  1. 1.华南农业大学 数学与信息学院,广州 510642
    2.中山大学 计算机学院,广州 510006
  • 收稿日期:2022-06-27 修回日期:2022-10-18 接受日期:2022-10-20 发布日期:2022-12-02 出版日期:2023-06-10
  • 通讯作者: 黄栋
  • 作者简介:劳景欢(1996—),女,广东湛江人,硕士研究生,CCF会员,主要研究方向:多视图聚类、大规模聚类
    黄栋(1987—),男,广东河源人,副教授,博士,CCF会员,主要研究方向:数据挖掘、机器学习Email:huangdonghere@gmail.com
    王昌栋(1984—),男,广东河源人,副教授,博士生导师,博士,CCF会员,主要研究方向:数据挖掘、机器学习
    赖剑煌(1964—),男,广东普宁人,教授,博士生导师,博士,CCF杰出会员,主要研究方向:生物特征识别、数字图像处理、模式识别、机器学习。
  • 基金资助:
    国家自然科学基金资助项目(61976097);广东省自然科学基金资助项目(2021A1515012203)

Multi-view ensemble clustering algorithm based on view-wise mutual information weighting

Jinghuan LAO1, Dong HUANG1(), Changdong WANG2, Jianhuang LAI2   

  1. 1.College of Mathematics and Informatics,South China Agricultural University,Guangzhou Guangdong 510642,China
    2.School of Computer Science and Engineering,Sun Yat?sen University,Guangzhou Guangdong 510006,China
  • Received:2022-06-27 Revised:2022-10-18 Accepted:2022-10-20 Online:2022-12-02 Published:2023-06-10
  • Contact: Dong HUANG
  • About author:LAO Jinghuan, born in 1996, M. S. candidate. Her research interests include multi-view clustering, large-scale clustering.
    WANG Changdong, born in 1984, Ph. D., associate professor. His research interests include data mining, machine learning.
    LAI Jianhuang, born in 1964, Ph. D., professor. His research interests include biometrics, digital image processing, pattern recognition, machine learning.
  • Supported by:
    National Natural Science Foundation of China(61976097);Natural Science Foundation of Guangdong Province(2021A1515012203)

摘要:

现有的多视图聚类算法往往缺乏对各视图可靠度的评估和对视图进行加权的能力,而一些具备视图加权的多视图聚类算法则通常依赖于特定目标函数的迭代优化,其目标函数的适用性及部分敏感超参数调优的合理性均对实际应用有显著影响。针对这些问题,提出一种基于视图互信息加权的多视图集成聚类(MEC-VMIW)算法,主要过程可分为两个阶段,即视图互加权阶段与多视图集成聚类阶段。在视图互信息加权阶段,对数据集进行多次随机降采样,以降低评估加权过程的问题规模,进而构建多视图降采样聚类集合,根据不同视图的聚类结果之间的多轮互评得到视图可靠度评估,并据此对视图进行加权;在多视图集成聚类阶段,对各个视图数据构建基聚类集合,并将多个基聚类集合加权建模至二部图结构,利用高效二部图分割算法得到最终多视图聚类结果。在若干个多视图数据集上的实验结果验证了所提出的多视图集成聚类算法的鲁棒聚类性能。

关键词: 数据聚类, 多视图聚类, 互信息, 集成聚类, 视图加权, 二部图

Abstract:

Many of the existing multi-view clustering algorithms lack the ability to estimate the reliability of different views and thus weight the views accordingly, and some multi-view clustering algorithms with view-weighting ability generally rely on the iterative optimization of specific objective function, whose real-world applications may be significantly influenced by the practicality of the objective function and the rationality of tuning some sensitive hyperparameters. To address these problems, a Multi-view Ensemble Clustering algorithm based on View-wise Mutual Information Weighting (MEC-VMIW) was proposed, whose overall process consists of two phases: the view-wise mutual weighting phase and the multi-view ensemble clustering phase. In the view-wise mutual weighting phase, multiple random down-samplings were performed to the dataset, so as to reduce the problem size in the evaluating and weighting process. After that, a set of down-sampled clusterings of multiple views was constructed. And, based on multiple runs of mutual evaluation among the clustering results of different views, the view-wise reliability was estimated and used for view weighting. In the multi-view ensemble clustering phase, the ensemble of base clusterings was constructed for each view, and multiple base clustering sets were weighted to model a bipartite graph structure. By performing efficient bipartite graph partitioning, the final multi-view clustering results were obtained. Experiments on several multi-view datasets confirm the robust clustering performance of the proposed multi-view ensemble clustering algorithm.

Key words: data clustering, multi-view clustering, mutual information, ensemble clustering, view weighting, bipartite graph

中图分类号: