动态异构信息融合的科研合作潜力预测

doi:10.11772/j.issn.1001-9081.2022081266

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2775-2783.DOI: 10.11772/j.issn.1001-9081.2022081266

动态异构信息融合的科研合作潜力预测

马国帅¹^,², 钱宇华¹^,²^,³, 张亚宇¹^,², 李俊霞¹^,², 刘郭庆¹^,²

^1.山西大学大数据科学与产业研究院, 太原 030006
^2.山西大学计算机与信息技术学院, 太原 030006
^3.计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006

收稿日期:2022-08-26 修回日期:2022-11-04 接受日期:2022-11-14 发布日期:2023-01-11 出版日期:2023-09-10
通讯作者: 钱宇华
作者简介:马国帅（1992—），男，山西吕梁人，博士研究生，CCF会员，主要研究方向：复杂网络、数据挖掘
张亚宇（1993—），女，山西长治人，博士研究生，CCF会员，主要研究方向：进化计算、数据挖掘
李俊霞（1996—），女，山西临汾人，硕士研究生，主要研究方向：模式识别、人工智能、机器学习
刘郭庆（1994—），女，山西临汾人，博士研究生，CCF会员，主要研究方向：强化学习、数据挖掘。
基金资助:
国家自然科学基金资助项目(62136005);国家重点研发计划项目(2021ZD0112402┫。jinchengqyh@126.com)

Scientific collaboration potential prediction based on dynamic heterogeneous information fusion

Guoshuai MA¹^,², Yuhua QIAN¹^,²^,³, Yayu ZHANG¹^,², Junxia LI¹^,², Guoqing LIU¹^,²

^1.Institute of Big Data Science and Industry，Shanxi University，Taiyuan Shanxi 030006，China
^2.School of Computer and Information Technology，Shanxi University，Taiyuan Shanxi 030006，China
^3.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education （Shanxi University），Taiyuan Shanxi 030006，China

Received:2022-08-26 Revised:2022-11-04 Accepted:2022-11-14 Online:2023-01-11 Published:2023-09-10
Contact: Yuhua QIAN
About author:MA Guoshuai， born in 1992， Ph. D. candidate. His research interests include complex network， data mining.
ZHANG Yayu， born in 1993， Ph. D. candidate. Her research interests include evolutionary computing， data mining.
LI Junxia， born in 1996， M.S. candidate. Her research interests include pattern recognition， artificial intelligence， machine learning.
LIU Guoqing， born in 1994， Ph. D. candidate. Her research interests include reinforcement learning， data mining.
Supported by:
National Natural Science Foundation of China(62136005);National Key Research and Development Program of China(2021ZD0112402)

摘要/Abstract

摘要：

现有的科研合作潜力预测方法使用特征工程来人工提取科研合作网络中作者的浅层静态属性，忽略了科研合作网络中异构实体间的关联关系。针对以上不足，提出融合科研合作网络中的多种实体潜在属性信息的动态合作潜力预测（CPP）模型，在提取异构实体的属性的同时考虑了学者与学者之间合作关系的结构特征，并且通过协同优化的方式优化模型，实现了在为学者进行科研合作者推荐的同时预测科研合作潜力的目标。为验证所提模型的有效性，搜集整理了发表在中国计算机学会（CCF）推荐期刊中的50余万篇论文信息以及相关实体的完整属性信息，并采用滑窗法构建了不同时间段的时序合作异构网络，以提取科研合作网络演化过程中的各实体的动态属性信息。此外，为提高所提模型的泛化性以及实用性，随机输入不同时段的数据对模型进行训练。实验结果表明，相较于次优的多层采样聚合图神经网络（GraphSAGE），CPP模型在合作者推荐任务上的分类精确度提高了1.47个百分点；在合作潜力预测任务上的测试误差降低了1.23%。说明了CPP模型能更精准地为学者推荐优质合作者。

关键词: 合作潜力预测, 异构图神经网络, 信息融合, 科研合作者推荐, 时序网络

Abstract:

In the existing scientific collaboration potential prediction methods， feature engineering is used to extract the shallow and static attributes of authors in scientific collaboration networks manually. At the same time， the relationships among heterogeneous entities in the scientific collaboration networks are ignored. To address this shortcoming， a dynamic Collaboration Potential Prediction （CPP） model was proposed to incorporate the potential attribute information of multiple entities in scientific collaboration networks. In this model， the structural features of scholar-scholar collaboration relationships were considered while extracting attributes of heterogeneous entities， and the model was optimized by the collaborative optimization method to realize the prediction of scientific collaboration potential while recommending scientific collaborators for scholars. To verify the effectiveness of the proposed model， the information of more than 500 000 papers published in the China Computer Federation （CCF）-recommended journals and the complete attribute information of related entities were collected and collated. And the temporal collaborative heterogeneous networks of different periods were constructed by the sliding window method to extract the dynamic attribute information of each entity during the evolution of the scientific collaborative network. In addition， to improve the generalization and practicality of the proposed model， the data from different periods were input to train the model randomly. Experimental results show that compared with the suboptimal model — Graph Sample and aggregate network （GraphSAGE）， CPP model improves the classification accuracy on collaborator recommendation task by 1.47 percentage points； for the cooperation potential prediction task， the test error of CPP is 1.23% lower than that of GraphSAGE. In conclusion， CPP model can recommend high-quality collaborators for scholars more accurately.

Key words: collaboration potential prediction, heterogeneous graph neural network, information fusion, scientific collaborator recommendation, temporal network

中图分类号:

TP391

马国帅, 钱宇华, 张亚宇, 李俊霞, 刘郭庆. 动态异构信息融合的科研合作潜力预测[J]. 计算机应用, 2023, 43(9): 2775-2783.

Guoshuai MA, Yuhua QIAN, Yayu ZHANG, Junxia LI, Guoqing LIU. Scientific collaboration potential prediction based on dynamic heterogeneous information fusion[J]. Journal of Computer Applications, 2023, 43(9): 2775-2783.

图/表 10

图1 科研合作网络实体关联

Fig. 1 Relationships among entities in scientific collaboration network

图2 动态异构信息融合的合作潜力预测模型的结构

Fig. 2 Structure of cooperation potential prediction model with dynamic heterogeneous information fusion

表 1 CCF推荐期刊合作网络中的实体以及相关信息

Tab. 1 Entities and related information in collaboration network of CCF-recommended journals

序号	实体	属性	类型
1	论文	题目	Sting
2	论文	摘要	String
3	作者	姓名	String
4	作者	发表论文总量	Int
5	机构	名称	String
6		发表论文总量	Int
7		发表论文总引用量	Int
8		经度	Float
9		纬度	Float
10	期刊	名称	String
11		收录论文总量	Int
12		收录论文总引用量	Int
13		期刊等级	String

表2 CCF推荐期刊合作异构网络训练集的详细信息

Tab. 2 Detailed information of dataset of collaboration heterogeneous network of CCF-recommended journals

科研合作异构网络					合作关系
起始年	终止年	论文数	作者数	链接数	起始年	终止年	链接数
1998	2007	132 460	156 510	323 220	2008	2010	197
1999	2008	142 494	170 928	355 766	2009	2011	230
2000	2009	153 306	187 125	392 043	2010	2012	324
2001	2010	165 329	204 332	431 078	2011	2013	351
2002	2011	177 846	223 107	473 775	2012	2014	470
2003	2012	191 038	243 473	519 827	2013	2015	556

图3 协同优化与单独优化的不同任务的精度变化

Fig. 3 Accuracy change for different tasks using co-optimization and separate optimization

图4 融合部分特征信息模型精度的变化

Fig. 4 Accuracy change of models fusing partial feature information

图5 单一时段数据训练与交叉训练对模型性能的影响

Fig. 5 Influence of single period data training and cross-training on model performance

图6 合作次数对合潜力预测性能的影响

Fig. 6 Influence of cooperation times on prediction performance of cooperation potential

图7 不同隐藏层维度的模型性能

Fig. 7 Performance of models with different hidden layer dimensions

表3 CPP与其他算法的性能对比

Tab. 3 Performance comparison of CPP and other algorithms

算法	合作者推荐/%			合作潜力预测
算法	精确率	召回率	F1	MSE损失
决策树	26.86	28.44	27.99	1.636 0
GCN	71.55	80.10	74.48	0.146 7
GAT	70.98	78.60	74.34	0.146 5
GraphSAGE	74.82	81.56	75.83	0.146 4
HAN	73.32	—	—	0.146 0
CPP	76.29	81.78	75.57	0.144 6

参考文献 30

1	SHRUM W， GENUTH J， CHOMPALOV I. Structures of Scientific Collaboration［M］. Cambridge： MIT Press， 2007：7-15. 10.7551/mitpress/7461.001.0001
2	EBADI A， SCHIFFAUEROVA A. How to become an important player in scientific collaboration networks？［J］. Journal of Informetrics， 2015， 9（4）： 809-825. 10.1016/j.joi.2015.08.002
3	YANG C， SUN J S， MA J， et al. Scientific collaborator recommendation in heterogeneous bibliographic networks［C］// Proceedings of the 48th Hawaii International Conference on System Sciences. Piscataway： IEEE， 2015： 552-561. 10.1109/hicss.2015.73
4	KONG X J， JIANG H Z， WANG W， et al. Exploring dynamic research interest and academic influence for scientific collaborator recommendation［J］. Scientometrics， 2017， 113（1）： 369-385. 10.1007/s11192-017-2485-9
5	PRADHAN T， PAL S. A multi-level fusion based decision support system for academic collaborator recommendation［J］. Knowledge-Based Systems， 2020， 197： No.105784. 10.1016/j.knosys.2020.105784
6	艾科，马国帅，杨凯凯，等. 一种基于集成学习的科研合作者潜力预测分类方法［J］. 计算机研究与发展， 2019， 56（7）：1383-1395. 10.7544/issn1000-1239.2019.20180641
	AI K， MA G S， YANG K K， et al. A classification method of scientific collaborator potential prediction based on ensemble learning［J］. Journal of Computer Research and Development， 2019， 56（7）：1383-1395. 10.7544/issn1000-1239.2019.20180641
7	FORTUNATO S， BERGSTROM C T， BÖRNER K， et al. Science of science［J］. Science， 2018， 359（6379）： No.eaao0185. 10.1126/science.aao0185
8	XIA F， WANG W， BEKELE T M， et al. Big scholarly data： a survey［J］. IEEE Transactions on Big Data， 2017， 3（1）： 18-35. 10.1109/tbdata.2016.2641460
9	SHARMA D， KUMAR B， CHAND S. Recommending researchers in machine learning based on author-topic model［EB/OL］. （2021-09-05）［2022-05-31］..
10	TANG J， WU S， SUN J M， et al. Cross-domain collaboration recommendation［C］// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2012： 1285-1293. 10.1145/2339530.2339730
11	周亦敏，黄俊. 基于BERT的学术合作者推荐研究［J］. 计算机技术与发展， 2021， 31（3）：45-51. 10.3969/j.issn.1673-629X.2021.03.008
	ZHOU Y M， HUANG J. Research on BERT-based academic collaborator recommendation［J］. Computer Technology and Development， 2021， 31（3）： 45-51. 10.3969/j.issn.1673-629X.2021.03.008
12	蒲姗姗. 基于知识互补的科研合作专家推荐模型研究［J］. 情报理论与实践， 2018， 41（8）： 96-101.
	PU S S. Expert recommendation model in scientific and technical collaboration based on complementary knowledge［J］. Information Studies： Theory and Application， 2018， 41（8）： 96-101.
13	黄璐，倪兴兴，程坷飞，等. 基于二模网络链路预测的合作者识别方法研究［J］. 情报学报， 2020， 39（9）：906-913. 10.3772/j.issn.1000-0135.2020.09.003
	HUANG L， NI X X， CHENG K F， et al. Identification of potential research partners based on two-mode network analysis［J］. Journal of the China Society for Scientific and Technical Information， 2020， 39（9）：906-913. 10.3772/j.issn.1000-0135.2020.09.003
14	张鑫，文奕，许海云. 一种融合表示学习与主题表征的作者合作预测模型［J］. 数据分析与知识发现， 2021， 5（3）： 88-100. 10.11925/infotech.2096-3467.2020.0515
	ZHANG X， WEN Y， XU H Y. A prediction model with network representation learning and topic model for author collaboration［J］. Data Analysis and Knowledge Discovery， 2021， 5（3）： 88-100. 10.11925/infotech.2096-3467.2020.0515
15	熊回香，杨雪萍，蒋武轩，等. 基于学术能力及合作关系网络的学者推荐研究［J］. 情报科学， 2019， 37（5）： 71-78.
	XIONG H X， YANG X P， JIANG W X， et al. Scholars recommend research based on academic competence and collaborative networks［J］. Information Science， 2019， 37（5）： 71-78.
16	CHUAN P M， SON L H， ALI M， et al. Link prediction in co-authorship networks based on hybrid content similarity metric［J］. Applied Intelligence， 2018， 48（8）： 2470-2486. 10.1007/s10489-017-1086-x
17	XIA F， CHEN Z， WANG W， et al. MVCWalker： random walk-based most valuable collaborators recommendation exploiting academic factors［J］. IEEE Transactions on Emerging Topics in Computing， 2014， 2（3）： 364-375. 10.1109/tetc.2014.2356505
18	林原，王凯巧，刘海峰，等. 网络表示学习在学者科研合作预测中的应用研究［J］. 情报学报， 2020， 39（4）：367-373. 10.3772/j.issn.1000-0135.2020.04.003
	LIN Y， WANG K Q， LIU H F， et al. Application of network representation learning in the prediction of scholar academic cooperation［J］. Journal of the China Society for Scientific and Technical Information， 2020， 39（4）： 367-373. 10.3772/j.issn.1000-0135.2020.04.003
19	WANG W， LIU J Y， TANG T， et al. Attributed collaboration network embedding for academic relationship mining［J］. ACM Transactions on the Web， 2021， 15（1）： No.4. 10.1145/3409736
20	艾科. 科研工作者合作潜力预测问题研究［D］. 太原：山西大学， 2019： 20-45. 10.7544/issn1000-1239.2019.20180641
	AI K. Research on the prediction of researchers cooperation potential［D］. Taiyuan： Shanxi University， 2019： 20-45. 10.7544/issn1000-1239.2019.20180641
21	LEE J， LEE I， KANG J. Self-attention graph pooling［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 3734-3743.
22	FLORES-SZWAGRZAK K， TREIBICH R. Teamwork and individual productivity［J］. Management Science， 2020， 66（6）： 2523-2544. 10.1287/mnsc.2019.3305
23	HAGEN N T. Harmonic allocation of authorship credit： Source-level correction of bibliometric bias assures accurate publication and citation analysis［J］. PLoS ONE， 2008， 3（12）： No.e4021. 10.1371/journal.pone.0004021
24	STALLINGS J， VANCE E， YANG J S， et al. Determining scientific impact using a collaboration index［J］. Proceedings of the National Academy of Sciences of the United States of America， 2013， 110（24）： 9680-9685. 10.1073/pnas.1220184110
25	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. （2017-01-30）［2022-05-22］..
26	JIN W， DERR T， WANG Y Q， et al. Node similarity preserving graph convolutional networks［C］// Proceedings of the 14th ACM International Conference on Web Search and Data Mining. New York： ACM， 2021： 148-156. 10.1145/3437963.3441735
27	BRODY S， ALON U， YAHAV E. How attentive are graph attention networks？［EB/OL］. （2022-01-31）［2022-05-31］..
28	LIU J L， ONG G P， CHEN X Q. GraphSAGE-based traffic speed forecasting for segment network with sparse data［J］. IEEE Transactions on Intelligent Transportation Systems， 2022， 23（3）：1755-1766. 10.1109/tits.2020.3026025
29	JIN D， HUO C Y， LIANG C D， et al. Heterogeneous graph neural network via attribute completion［C］// Proceedings of the Web Conference 2021. Republic and Canton of Geneva： International World Wide Web Conferences Steering Committee： ACM， 2021： 391-400. 10.1145/3442381.3449914
30	LI Q B， WEN Z Y， HE B S. Practical federated gradient boosting decision trees［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 4642-4649. 10.1609/aaai.v34i04.5895

[1]	李豆豆, 李汪根, 夏义春, 束阳, 高坤. 基于特征交互与自适应融合的骨骼动作识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2581-2587.
[2]	王思蕊, 程世娟, 袁非梦. 基于改进证据融合的高可靠产品可靠性评估方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2140-2146.
[3]	王昱, 范子琳, 任田君, 姬晓飞. 不完备信息下基于切换推理证据网络的空中目标识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1071-1078.
[4]	丁行硕, 李翔, 谢乾. 基于标签分层延深建模的企业画像构建方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1170-1177.
[5]	焦守龙, 段友祥, 孙歧峰, 庄子浩, 孙琛皓. 融合实体描述信息和邻居节点特征的知识表示学习方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1050-1056.
[6]	李昕, 贾韬. 基于组蛋白修饰数据预测基因差异性表达的深度融合模型[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3404-3412.
[7]	张露, 刘家鹏, 田冬梅. 基于Stacking-Bagging-Vote多源信息融合模型的财务预警应用[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 280-286.
[8]	张蓉, 张献国. 基于层次异构图注意力网络的虚假评论检测[J]. 计算机应用, 2021, 41(5): 1275-1281.
[9]	曹建芳, 田晓东, 贾一鸣, 闫敏敏. 改进DeepLabV3+模型在壁画分割中的应用[J]. 计算机应用, 2021, 41(5): 1471-1476.
[10]	任柯舟, 彭甫镕, 郭鑫, 王喆, 张晓静. 动态融合社交信息的社会化推荐[J]. 计算机应用, 2021, 41(10): 2806-2812.
[11]	李扬, 张伟, 彭晨. 目标依赖的作者身份识别方法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 473-478.
[12]	刘正铭, 马宏, 刘树新, 李海涛, 常圣. 融合节点描述属性信息的网络表示学习算法[J]. 计算机应用, 2019, 39(4): 1012-1020.
[13]	刘盼, 张榜, 黄超, 杨卫军, 徐正蓺. 室内环境约束的行人航向粒子滤波修正方法[J]. 计算机应用, 2018, 38(12): 3360-3366.
[14]	高军强, 汤霞清, 张环, 郭理彬. 基于因子图算法的INS/GPS信息滞后处理方法[J]. 计算机应用, 2018, 38(11): 3342-3347.
[15]	张涛, 马磊, 梅玲玉. 基于单目视觉的仓储物流机器人定位方法[J]. 计算机应用, 2017, 37(9): 2491-2495.

动态异构信息融合的科研合作潜力预测

Scientific collaboration potential prediction based on dynamic heterogeneous information fusion

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 30

相关文章 15

编辑推荐

Metrics