《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (5): 1372-1378.DOI: 10.11772/j.issn.1001-9081.2024071001

• 第十届中国数据挖掘会议 • 上一篇    

基于张量化图卷积网络和对比学习的多源数据表示学习模型

龙雨菲, 牟宇辰, 刘晔()   

  1. 华南理工大学 未来技术学院,广州 510641
  • 收稿日期:2024-07-23 修回日期:2024-09-07 接受日期:2024-09-10 发布日期:2024-09-25 出版日期:2025-05-10
  • 通讯作者: 刘晔
  • 作者简介:龙雨菲(2003—),女,湖南衡阳人,主要研究方向:数据挖掘、图学习、机器学习
    牟宇辰(2002—),男,山东淄博人,CCF会员,主要研究方向:图学习、表示学习、自监督学习
    刘晔(1995—),女,湖南衡阳人,副教授,博士生导师,博士,CCF会员,主要研究方向:机器学习、情感计算。
  • 基金资助:
    国家级大学生创新创业训练计划项目(202310561173)

Multi-source data representation learning model based on tensorized graph convolutional network and contrastive learning

Yufei LONG, Yuchen MOU, Ye LIU()   

  1. School of Future Technology,South China University of Technology,Guangzhou Guangdong 510641,China
  • Received:2024-07-23 Revised:2024-09-07 Accepted:2024-09-10 Online:2024-09-25 Published:2025-05-10
  • Contact: Ye LIU
  • About author:LONG Yufei, born in 2003. Her research interests include data mining, graph learning, machine learning.
    MOU Yuchen, born in 2002. His research interests include graph learning, representation learning, self-supervised learning.
    LIU Ye, born in 1995, Ph. D., associate professor. Her research interests include machine learning, affective computing.
  • Supported by:
    National Training Program of Innovation and Entrepreneurship for Undergraduates(202310561173)

摘要:

针对现有多源数据表示学习模型在处理大规模复杂高维数据时存在的容易遗漏数据源间高阶关联信息和易受到噪声干扰的问题,提出一种基于张量化图卷积网络(T-GCN)和对比学习的多源数据表示学习模型(MS-TGC)。首先,利用K近邻(KNN)算法和图卷积网络(GCN)统一多源数据维度,拼接得到张量化多源数据;其次,利用定义的张量图卷积算子实现高维图卷积运算,同时学习数据源内部信息及数据源间关联信息;最后,构建多源数据对比学习范式,通过添加基于语义一致性与标签一致性的对比约束,提升MS-TGC在处理含噪声数据时的表示学习准确率,增强模型的鲁棒性。实验结果表明,当有标签样本率为0.3时,与CONMF(Co-consensus Orthogonal Non-negative Matrix Factorization)模型相比,MS-TGC在BDGP和20newsgroup数据集上的半监督分类准确率分别提升了1.36和5.53个百分点。可见MS-TGC能够更有效地捕捉数据源间关联信息,降低噪声干扰,得到高质量多源数据表示。

关键词: 多源数据表示学习, 图卷积神经网络, 张量图卷积算子, 对比学习, 半监督分类

Abstract:

To address the issues of existing multi-source data representation learning models in processing large-scale, complex, and high-dimensional data, specifically the tendency to overlook high-order association among different sources, and susceptibility to noise, a Multi-Source data representation learning model based on Tensorized Graph convolutional network and Contrastive learning, namely MS-TGC, was proposed. Firstly, the K-Nearest Neighbors (KNN) algorithm and Graph Convolutional Network (GCN) were used to unify multi-source data dimensions, forming tensorized multi-source data. Then, a defined tensor graph convolution operator was applied to perform high-dimensional graph convolution operations, enabling simultaneous learning of intra-source and inter-source information. Finally, a multi-source contrastive learning paradigm was constructed to enhance the accuracy of representation learning in noisy data and improve robustness against noise by incorporating contrastive constraints based on semantic consistency and label consistency. Experimental results show that when the labeled sample ratio is 0.3, MS-TGC achieves 1.36 and 5.53 percentage points higher semi-supervised classification accuracy than CONMF (Co-consensus Orthogonal Non-negative Matrix Factorization) on BDGP and 20newsgroup datasets, respectively. These results indicate that MS-TGC effectively captures inter-source correlations, reduces noise interference, and achieves high-quality multi-source data representations.

Key words: multi-source data representation learning, Graph Convolutional Network (GCN), tensor graph convolution operator, contrastive learning, semi-supervised classification

中图分类号: