Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3354-3363.DOI: 10.11772/j.issn.1001-9081.2021111981

• CCF Bigdata 2021 • Previous Articles    

Graph convolutional network method based on hybrid feature modeling

Zhuoran LI1,2,3,4, Zhonglin YE1,2,3,4, Haixing ZHAO1,2,3,4(), Jingjing LIN1,2,3,4   

  1. 1.Computer College,Qinghai Normal University,Xining Qinghai 810016,China
    2.State Key Laboratory of Tibetan Intelligent Information Processing and Application Co?established by Ministry of Science and Technology and Qinghai Province (Qinghai Normal University),Xining Qinghai 810008,China
    3.Key Laboratory of Tibetan Information Processing,Ministry of Education (Qinghai Normal University),Xining Qinghai 810008,China
    4.Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province (Qinghai Normal University),Xining Qinghai 810008,China
  • Received:2021-11-22 Revised:2022-01-12 Accepted:2022-01-14 Online:2022-01-25 Published:2022-11-10
  • Contact: Haixing ZHAO
  • About author:LI Zhuoran, born in 1996, M. S. candidate. His research interests include data mining, graph neural network.
    YE Zhongli, born in 1989, Ph. D., associate professor. His research interests include question answering system, network representation learning.
    ZHAO Haixing, born in 1969, Ph. D., professor. His research interests include complex network, network reliability.
    LIN Jingjing, born in 1986, Ph. D. candidate, lecturer. Her research interests include data mining, hypergraph neural network.
  • Supported by:
    National Key Research and Development Program of China(2020YFC1523300);Natural Science Foundation of Qinghai Province(2021?ZJ?946Q);Middle?Youth Natural Science Foundation of Qinghai Normal University(2020QZR007)

基于混合特征建模的图卷积网络方法

李卓然1,2,3,4, 冶忠林1,2,3,4, 赵海兴1,2,3,4(), 林晶晶1,2,3,4   

  1. 1.青海师范大学 计算机学院, 西宁 810016
    2.省部共建藏语智能信息处理及应用国家重点实验室(青海师范大学), 西宁 810008
    3.藏文信息处理教育部重点实验室(青海师范大学), 西宁 810008
    4.青海省藏文信息处理与机器翻译重点实验室(青海师范大学), 西宁 810008
  • 通讯作者: 赵海兴
  • 作者简介:李卓然(1996—),男,内蒙古乌兰察布人,硕士研究生,CCF会员 ,主要研究方向:数据挖掘、图神经网络
    冶忠林(1989—),男,青海民和人,副教授,博士,CCF会员,主要研究方向:问答系统、网络表示学习
    赵海兴(1969—),男,青海湟中人,教授,博士,CCF会员,主要研究方向:复杂网络、网络可靠性 h.x.zhao@163.com
    林晶晶(1986—),女,甘肃临洮人,讲师,博士研究生,CCF会员,主要研究方向:数据挖掘、超图神经网络。
  • 基金资助:
    国家重点研发计划项目(2020YFC1523300);青海省自然科学基金资助项目(2021?ZJ?946Q);青海师范大学自然科学中青年科研基金资助项目(2020QZR007)

Abstract:

For the complex information contained in the network, more ways are needed to extract useful information from it, but the relevant characteristics in the network cannot be completely described by the existing single?feature Graph Neural Network (GNN). To resolve the above problems, a Hybrid feature?based Dual Graph Convolutional Network (HDGCN) was proposed. Firstly, the structure feature vectors and semantic feature vectors of nodes were obtained by Graph Convolutional Network (GCN). Secondly, the features of nodes were aggregated selectively so that the feature expression ability of nodes was enhanced by the aggregation function based on attention mechanism or gating mechanism. Finally, the hybrid feature vectors of nodes were gained by the fusion mechanism based on a feasible dual?channel GCN, and the structure features and semantic features of nodes were modeled jointly to make the features be supplement for each other and promote the method's performance on subsequent machine learning tasks. Verification was performed on the datasets CiteSeer, DBLP (DataBase systems and Logic Programming) and SDBLP (Simplified DataBase systems and Logic Programming). Experimental results show that compared with the graph convolutional network model based on structure feature training, the dual channel graph convolutional network model based on hybrid feature training has the average value of Micro?F1 increased by 2.43, 2.14, 1.86 and 2.13 percentage points respectively, and the average value of Macro?F1 increased by 1.38, 0.33, 1.06 and 0.86 percentage points respectively when the training set proportion is 20%, 40%, 60% and 80%. The difference in accuracy is no more than 0.5 percentage points when using concat or mean as the fusion strategy, which shows that both concat and mean can be used as the fusion strategy. HDGCN has higher accuracy on node classification and clustering tasks than models trained by structure or semantic network alone, and has the best results when the output dimension is 64, the learning rate is 0.001, the graph convolutional layer number is 2 and the attention vector dimension is 128.

Key words: attention mechanism, gating mechanism, dual channel graph convolutional network, structure feature, semantic feature

摘要:

对于网络中拥有的复杂信息,需要更多的方式抽取其中的有用信息,但现有的单特征图神经网络(GNN)无法完整地刻画网络中的相关特性。针对该问题,提出基于混合特征的图卷积网络(HDGCN)方法。首先,通过图卷积网络(GCN)得到节点的结构特征向量和语义特征向量;然后,通过改进基于注意力机制或门控机制的聚合函数选择性地聚合语义网络节点的特征,增强节点的特征表达能力;最后,通过一种基于双通道图卷积网络的融合机制得到节点的混合特征向量,将节点的结构特征和语义特征联合建模,使特征之间互相补充,提升该方法在后续各种机器学习任务上的表现。在CiteSeer、DBLP和SDBLP三个数据集上进行实验的结果表明,与基于结构特征训练的GCN相比,HDGCN在训练集比例为20%、40%、60%、80%时的Micro?F1值平均分别提升了2.43、2.14、1.86和2.13个百分点,Macro?F1值平均分别提升了1.38、0.33、1.06和0.86个百分点。用拼接或平均值作为融合策略时,准确率相差不超过0.5个百分点,可见拼接和平均值均可作为融合策略。HDGCN在节点分类和聚类任务上的准确率高于单纯使用结构或语义网络训练的模型,并且在输出维度为64、学习率为0.001、2层图卷积层和128维注意力向量时的效果最好。

关键词: 注意力机制, 门控机制, 双通道图卷积网络, 结构特征, 语义特征

CLC Number: