《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (5): 1464-1470.DOI: 10.11772/j.issn.1001-9081.2023050846

所属专题: 第十九届中国机器学习会议(CCML 2023)

• 第十九届中国机器学习会议(CCML 2023) • 上一篇    下一篇

融合二连通模体结构信息的节点分类算法

郑文萍1,2,3(), 葛慧琳1, 刘美麟1, 杨贵1   

  1. 1.山西大学 计算机与信息技术学院, 太原 030006
    2.计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006
    3.山西大学 智能信息处理研究所, 太原 030006
  • 收稿日期:2023-06-29 修回日期:2023-07-26 接受日期:2023-07-31 发布日期:2023-08-16 出版日期:2024-05-10
  • 通讯作者: 郑文萍
  • 作者简介:葛慧琳(1997—),女,山西阳泉人,硕士研究生,主要研究方向:复杂网络分析
    刘美麟(1996—),女,山西吕梁人,博士研究生,主要研究方向:复杂网络分析、高阶网络
    杨贵(1975—),男,山西大同人,高级实验师,硕士,主要研究方向:数据挖掘、生物信息学。
    第一联系人:郑文萍(1979—),女,山西晋中人,教授,博士,CCF会员,主要研究方向:复杂网络分析、生物信息学
  • 基金资助:
    国家自然科学基金资助项目(62072292);山西省1331工程项目;教育部产学合作协同育人项目(220902842025336)

Node classification algorithm fusing 2-connected motif-structure information

Wenping ZHENG1,2,3(), Huilin GE1, Meilin LIU1, Gui YANG1   

  1. 1.School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China
    2.Key Laboratory of Computation Intelligence and Chinese Information Processing of Ministry of Education (Shanxi University),Taiyuan Shanxi 030006,China
    3.Institute of Intelligent Information Processing,Shanxi University,Taiyuan Shanxi 030006,China
  • Received:2023-06-29 Revised:2023-07-26 Accepted:2023-07-31 Online:2023-08-16 Published:2024-05-10
  • Contact: Wenping ZHENG
  • About author:GE Huilin, born in 1997, M. S. candidate. Her research interests include complex network analysis.
    LIU Meilin, born in 1996, Ph. D. candidate. Her research interests include complex network analysis, higher-order network.
    YANG Gui, born in 1975, M. S., senior experimentalist. His research interests include data mining, bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(62072292);1331 Engineering Project of Shanxi Province, University-Industry Collaborative Education Program of Ministry of Education(220902842025336)

摘要:

节点表示学习将图结构数据信息编码到低维的潜在空间中,在节点分类、聚类、链路预测等机器学习任务中被广泛应用。在复杂网络中,节点与节点之间不仅存在直接相连的低阶结构,也存在以特殊连接模式形成的高阶结构,称为模体。提出一种融合二连通模体结构信息的节点分类算法(FMI),利用节点间高阶二连通模体信息学习节点表示,完成节点分类任务。首先,统计网络中的二连通模体,利用其中信息提出一个节点重要性的度量指标——模体比值。根据模体比值计算采样概率进行邻域采样;构造一个带权辅助图以融合网络节点连接的低阶关系与高阶关系,对节点进行加权邻域聚合以得到节点表示。在5个数据集Cora、Citeseer、Pubmed、Wiki和DBLP上执行节点分类任务,与5种经典基准算法进行对比,所提算法FMI在准确度和F1-分数等指标上表现良好。

关键词: 节点表示, 二连通模体, 邻域采样, 邻域聚合, 节点分类

Abstract:

Node representation learning has been widely applied in machine learning tasks, such as node classification, clustering and link prediction, since it can encode graph structure data information into low-dimensional potential space. In complex networks, nodes are interacted through not only low-order interactions, but also higher-order interactions formed by special connection modes. The higher-order interactions of a complex network are also called motifs. A node classification algorithm Fusing 2-connected Motif-structure Information (FMI) was proposed to use motif information among nodes to obtain node representation for node classification tasks. Firstly, the 2-connected motifs in the network were counted. A measure index of node importance, named motif-ratio, was proposed by using the motif information in the node; and a sampling probability was calculated according to the motif-ratio to carry out neighborhood sampling. A weighted auxiliary graph was constructed to fuse the low-order relations and the high-order relations of network nodes to aggregate neighborhoods weightedly. The node classification was performed on 5 datasets, Cora, Citeseer, Pubmed, Wiki and DBLP. By comparing with 5 classical baseline algorithms, the proposed algorithm FMI shows better performance on Accuracy, F1-score and other indicators.

Key words: node representation, 2-connected motif, neighborhood sampling, neighborhood aggregation, node classification

中图分类号: