Journal of Computer Applications

    Next Articles

Multi-source data representation learning model based on tensorized graph convolutional network and contrastive learning

  

  • Received:2024-07-17 Revised:2024-09-07 Online:2024-09-25 Published:2024-09-25

基于张量化卷积神经网络和对比学习的多源数据表示学习模型

龙雨菲,牟宇辰,刘晔   

  1. 华南理工大学
  • 通讯作者: 刘晔
  • 基金资助:
    国家自然科学基金

Abstract: Abstract: To address the issues of existing multi-source data representation learning models in handling large-scale, complex, and high-dimensional data, specifically the tendency to overlook high-order association among different sources, and susceptibility to noise, a multi-source data representation learning model based on tensorized graph convolutional network and contrastive learning, named MS-TGC (Multi-Source Tensorized Graph Convolutional network with contrastive learning), was proposed. First, the K-Nearest Neighbors (KNN) algorithm and Graph Convolutional Network (GCN) were used to unify multi-source data dimensions, forming tensorized multi-source data. Then, a tensor graph convolution operator was applied to perform high-dimensional graph convolution operations, learning both intra-source and inter-source information. Finally, a multi-source contrastive learning paradigm was constructed, incorporating semantic and label consistency constraints to enhance robustness against noise. Experimental results show that with a labeled sample rate of 0.3, MS-TGC improves semi-supervised classification accuracy by 1.36% on the BDGP(Berkeley Drosophila Genome Project) dataset and 5.53% on the 20newsgroup(20newsgroup dataset) dataset compared to the CONMF(Co-consensus Orthogonal Non-negative Matrix Factorization) model. These results indicate that MS-TGC effectively captures inter-source correlations, reduces noise interference, and achieves high-quality multi-source data representations.

Key words: multi-source data representation learning, Graph Convolutional Network(GCN), tensor graph convolution operator, contrastive learning, semi-supervised classification

摘要: 摘 要: 针对现有多源数据表示学习模型在处理大规模复杂高维数据时存在的容易遗漏数据源间高阶关联信息和易受到噪声干扰的问题,提出一种基于张量化图卷积神经网络和对比学习的多源数据表示学习模型(MS-TGC)。首先,利用K近邻算法(KNN)和图卷积神经网络(GCN)统一多源数据维度,拼接得到张量化多源数据;其次,利用定义的张量图卷积算子实现高维图卷积运算,同时学习数据源内部信息及数据源间关联信息;最后,构建多源数据对比学习范式,通过添加基于语义一致性与标签一致性的对比约束,提升MS-TGC在处理含噪声数据时的表示学习准确度,增强模型的鲁棒性。实验结果表明,在有标签样本率为0.3的情况下,在BDGP(Berkeley Drosophila Genome Project)数据集上,MS-TGC相较CONMF(Co-consensus Orthogonal Non-negative Matrix Factorization)模型提升了1.36%的半监督分类准确度;在20newsgroup(20newsgroup dataset)数据集上,MS-TGC相较CONMF模型提升了5.53%的半监督分类准确度。可见MS-TGC能够更有效地捕捉数据源间关联信息,降低噪声干扰,得到高质量多源数据表示。

关键词: 多源数据表示学习, 图卷积神经网络, 张量图卷积算子, 对比学习, 半监督分类

CLC Number: