Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (10): 3267-3274.DOI: 10.11772/j.issn.1001-9081.2023101481

• The 40th CCF National Database Conference (NDBC 2023) • Previous Articles     Next Articles

Multi-view clustering network guided by graph contrastive learning

Yunhua ZHU1, Bing KONG1(), Lihua ZHOU1, Hongmei CHEN1, Chongming BAO2   

  1. 1.School of Information Science and Engineering,Yunnan University,Kunming Yunnan 650504,China
    2.School of Software,Yunnan University,Kunming Yunnan 650504,China
  • Received:2023-10-30 Revised:2023-12-07 Accepted:2023-12-26 Online:2024-10-15 Published:2024-10-10
  • Contact: Bing KONG
  • About author:ZHU Yunhua, born in 1998, M. S. candidate. His research interests include data mining, multi-view clustering.
    ZHOU Lihua, born in 1968, Ph. D., professor. Her research interests include data mining, multi-view learning, social network analysis.
    CHEN Hongmei, born in 1976, Ph. D., associate professor. Her research interests include spatial data mining.
    BAO Chongming, born in 1971, M. S., assistant research fellow. His research interests include social network analysis, machine learning.
  • Supported by:
    National Natural Science Foundation of China(62062066);Key Project of Yunnan Provincial Basic Research Plan(202201AS070015)

图对比学习引导的多视图聚类网络

朱云华1, 孔兵1(), 周丽华1, 陈红梅1, 包崇明2   

  1. 1.云南大学 信息学院,昆明 650504
    2.云南大学 软件学院,昆明 650504
  • 通讯作者: 孔兵
  • 作者简介:朱云华(1998—),男,重庆人,硕士研究生,CCF会员,主要研究方向:数据挖掘、多视图聚类
    孔兵(1968—),男,云南昆明人,副教授,博士,主要研究方向:数据挖掘、机器学习、社会网络分析 kongbing@ynu.edu.cn
    周丽华(1968—),女,云南华坪人,教授,博士,CCF会员,主要研究方向:数据挖掘、多视角学习、社会网络分析
    陈红梅(1976—),女,重庆人,副教授,博士,CCF会员,主要研究方向:空间数据挖掘
    包崇明(1971—),男,云南宣威人,助理研究员,硕士,主要研究方向:社会网络分析、机器学习。
  • 基金资助:
    国家自然科学基金资助项目(62062066);云南省基础研究计划项目(202201AS070015)

Abstract:

Multi-view clustering has attracted much attention due to its ability to utilize information from multiple perspectives. However, current multi-view clustering algorithms generally suffer from the following issues: 1) they focus on either attribute features or structural features of the data without fully integrating both to improve the quality of the latent embeddings; 2) methods based on graph neural networks can simultaneously utilize attribute and structural data, but the models based on graph convolution or graph attention tend to produce over-smoothed results when the network becomes too deep. To address these problems, a Multi-view Clustering Network guided by Graph Contrastive Learning (MCNGCL) was proposed. Firstly, the private representation of each view was captured using a multi-view autoencoder module. Secondly, a common representation was constructed through adaptively weighted fusion. Thirdly, the graph contrastive learning module was incorporated to make adjacent nodes more easily partitioned into the same cluster during clustering, while also alleviating the over-smoothing problem when aggregating neighbor node information. Finally, a self-supervised clustering module was used to optimize the common representation and private representations of views towards more favorable clustering directions. The experimental results demonstrate that MCNGCL achieves promising performance on multiple datasets. For instance, on the 3sources dataset, compared with the sub-optimal Consistent Multiple Graph Embedding for multi-view Clustering (CMGEC), the accuracy of MCNGCL improved by 2.83 percentage points and the Normalized Mutual Information (NMI) improved by 3.70 percentage points. The effectiveness of MCNGCL was also confirmed by the results of ablation experiments and parameter sensitivity analysis.

Key words: multi-view clustering, contrastive learning, representation learning, self-supervised clustering, deep learning

摘要:

多视图聚类由于能从多个角度利用数据的信息引起了广泛的关注。然而,目前的多视图聚类算法普遍存在以下几个问题:1)专注数据的属性特征或结构特征,没有充分结合这两种信息,以提高潜在嵌入的质量;2)基于图神经网络的方法虽然能同时利用属性和结构数据,但是基于图卷积或图注意力的模型在网络层数过深时会产生过度平滑的问题。为了解决以上问题,提出一个图对比学习引导的多视图聚类网络(MCNGCL)。首先,使用多视图自编码器模块捕捉每个视图的私有表示;其次,通过自适应加权融合构造公共表示;再次,结合图对比学习模块,使相邻节点在聚类时更容易被划分为同簇,同时缓解网络在聚合邻居节点信息时产生的过度平滑的问题;最后,使用自监督聚类模块,使公共表示和视图的私有表示向有利于聚类的方向优化。实验结果表明,MCNGCL在多个数据集上都取得了不错的效果,在3sources数据集上,与次优的CMGEC(Consistent Multiple Graph Embedding for multi-view Clustering)相比,MCNGCL的准确率指标提升了2.83个百分点,规范化互信息(NMI)指标提升了3.70个百分点;消融实验和参数敏感性分析结果也验证了MCNGCL的有效性。

关键词: 多视图聚类, 对比学习, 表示学习, 自监督聚类, 深度学习

CLC Number: