《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2775-2783.DOI: 10.11772/j.issn.1001-9081.2022081266

• 数据科学与技术 • 上一篇    下一篇

动态异构信息融合的科研合作潜力预测

马国帅1,2, 钱宇华1,2,3, 张亚宇1,2, 李俊霞1,2, 刘郭庆1,2   

  1. 1.山西大学 大数据科学与产业研究院, 太原 030006
    2.山西大学 计算机与信息技术学院, 太原 030006
    3.计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006
  • 收稿日期:2022-08-26 修回日期:2022-11-04 接受日期:2022-11-14 发布日期:2023-01-11 出版日期:2023-09-10
  • 通讯作者: 钱宇华
  • 作者简介:马国帅(1992—),男,山西吕梁人,博士研究生,CCF会员,主要研究方向:复杂网络、数据挖掘
    张亚宇(1993—),女,山西长治人,博士研究生,CCF会员,主要研究方向:进化计算、数据挖掘
    李俊霞(1996—),女,山西临汾人,硕士研究生,主要研究方向:模式识别、人工智能、机器学习
    刘郭庆(1994—),女,山西临汾人,博士研究生,CCF会员,主要研究方向:强化学习、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(62136005);国家重点研发计划项目(2021ZD0112402┫。jinchengqyh@126.com)

Scientific collaboration potential prediction based on dynamic heterogeneous information fusion

Guoshuai MA1,2, Yuhua QIAN1,2,3, Yayu ZHANG1,2, Junxia LI1,2, Guoqing LIU1,2   

  1. 1.Institute of Big Data Science and Industry,Shanxi University,Taiyuan Shanxi 030006,China
    2.School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China
    3.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education (Shanxi University),Taiyuan Shanxi 030006,China
  • Received:2022-08-26 Revised:2022-11-04 Accepted:2022-11-14 Online:2023-01-11 Published:2023-09-10
  • Contact: Yuhua QIAN
  • About author:MA Guoshuai, born in 1992, Ph. D. candidate. His research interests include complex network, data mining.
    ZHANG Yayu, born in 1993, Ph. D. candidate. Her research interests include evolutionary computing, data mining.
    LI Junxia, born in 1996, M.S. candidate. Her research interests include pattern recognition, artificial intelligence, machine learning.
    LIU Guoqing, born in 1994, Ph. D. candidate. Her research interests include reinforcement learning, data mining.
  • Supported by:
    National Natural Science Foundation of China(62136005);National Key Research and Development Program of China(2021ZD0112402)

摘要:

现有的科研合作潜力预测方法使用特征工程来人工提取科研合作网络中作者的浅层静态属性,忽略了科研合作网络中异构实体间的关联关系。针对以上不足,提出融合科研合作网络中的多种实体潜在属性信息的动态合作潜力预测(CPP)模型,在提取异构实体的属性的同时考虑了学者与学者之间合作关系的结构特征,并且通过协同优化的方式优化模型,实现了在为学者进行科研合作者推荐的同时预测科研合作潜力的目标。为验证所提模型的有效性,搜集整理了发表在中国计算机学会(CCF)推荐期刊中的50余万篇论文信息以及相关实体的完整属性信息,并采用滑窗法构建了不同时间段的时序合作异构网络,以提取科研合作网络演化过程中的各实体的动态属性信息。此外,为提高所提模型的泛化性以及实用性,随机输入不同时段的数据对模型进行训练。实验结果表明,相较于次优的多层采样聚合图神经网络(GraphSAGE),CPP模型在合作者推荐任务上的分类精确度提高了1.47个百分点;在合作潜力预测任务上的测试误差降低了1.23%。说明了CPP模型能更精准地为学者推荐优质合作者。

关键词: 合作潜力预测, 异构图神经网络, 信息融合, 科研合作者推荐, 时序网络

Abstract:

In the existing scientific collaboration potential prediction methods, feature engineering is used to extract the shallow and static attributes of authors in scientific collaboration networks manually. At the same time, the relationships among heterogeneous entities in the scientific collaboration networks are ignored. To address this shortcoming, a dynamic Collaboration Potential Prediction (CPP) model was proposed to incorporate the potential attribute information of multiple entities in scientific collaboration networks. In this model, the structural features of scholar-scholar collaboration relationships were considered while extracting attributes of heterogeneous entities, and the model was optimized by the collaborative optimization method to realize the prediction of scientific collaboration potential while recommending scientific collaborators for scholars. To verify the effectiveness of the proposed model, the information of more than 500 000 papers published in the China Computer Federation (CCF)-recommended journals and the complete attribute information of related entities were collected and collated. And the temporal collaborative heterogeneous networks of different periods were constructed by the sliding window method to extract the dynamic attribute information of each entity during the evolution of the scientific collaborative network. In addition, to improve the generalization and practicality of the proposed model, the data from different periods were input to train the model randomly. Experimental results show that compared with the suboptimal model — Graph Sample and aggregate network (GraphSAGE), CPP model improves the classification accuracy on collaborator recommendation task by 1.47 percentage points; for the cooperation potential prediction task, the test error of CPP is 1.23% lower than that of GraphSAGE. In conclusion, CPP model can recommend high-quality collaborators for scholars more accurately.

Key words: collaboration potential prediction, heterogeneous graph neural network, information fusion, scientific collaborator recommendation, temporal network

中图分类号: