Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (5): 1356-1360.DOI: 10.11772/j.issn.1001-9081.2020081206

Special Issue: 数据科学与技术

• Data science and technology • Previous Articles     Next Articles

PageRank-based talent mining algorithm based on Web of Science

LI Chong1, WANG Yuchen1,2, DU Weijing1,2, HE Xiaotao1, LIU Xuemin1, ZHANG Shibo1, LI Shuren1   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-08-10 Revised:2020-10-30 Online:2021-05-10 Published:2021-05-19
  • Supported by:
    This work is partially supported by the CAS Informatization Special Project in the 13th Five-Year Plan (XXH13504-03).

基于Web of Science的PageRank人才挖掘算法

李翀1, 王宇宸1,2, 杜伟静1,2, 何晓涛1, 刘学敏1, 张士波1, 李树仁1   

  1. 1. 中国科学院 计算机网络信息中心, 北京 100190;
    2. 中国科学院大学, 北京 100049
  • 通讯作者: 王宇宸
  • 作者简介:李翀(1978-),男,安徽霍邱人,高级工程师,博士,CCF高级会员,主要研究方向:大数据、推荐系统;王宇宸(1996-),男,安徽怀远人,硕士研究生,主要研究方向:大数据管理;杜伟静(1993-),女,河北廊坊人,硕士研究生,主要研究方向:大数据管理;何晓涛(1971-),女,河北衡水人,高级工程师,硕士,主要研究方向:数据挖掘;刘学敏(1975-),男,山东烟台人,高级工程师,硕士,主要研究方向:大数据、云计算;张士波(1986-),男,山东聊城人,硕士,主要研究方向:大数据;李树仁(1972-),男,安徽亳州人,高级工程师,博士,主要研究方向:数据挖掘。
  • 基金资助:

Abstract: The high-level paper is one of the symbolic achievements of excellent scientific talents. Focusing on the "Web of Science (WOS)" hot research disciplines, on the basis of constructing the Neo4j semantic network graph of academic papers and mining active scientific research communities, the PageRank-based talent mining algorithm was used to realize the mining of outstanding scientific research talents in the scientific research communities. Firstly, the existing talent mining algorithms were studied and analyzed in detail. Secondly, combined with the WOS data, the PageRank-based talent mining algorithm was optimized and implemented by adding consideration factors such as the paper publication time factor, the author's order descending model, the influence of surrounding author nodes on this node, the number of citations of the paper. Finally, experiments and verifications were carried out based on the paper data of the communities of the hot discipline computer science in the past five years. The results show that community-based mining is more targeted, and can quickly find representative excellent and potential talents in various disciplines, and the improved algorithm is more effective and objective.

Key words: Web Of Science (WOS), Neo4j graph database, PageRank algorithm, talent mining

摘要: 高水平论文是优秀科技人才的标志性成果之一。聚焦“Web Of Science(WOS)”热点研究学科,在构建学术论文语义Neo4j网络图和挖掘出活跃科研社区基础上,利用PageRank人才挖掘算法实现对科研社区中优秀科研人才的挖掘。首先,对现有的人才挖掘算法进行详细研究和分析;其次,结合WOS论文数据对PageRank人才挖掘算法进行了优化设计和实现,加入了论文发表的时间因子、作者署名排序递减模型、周围作者节点对当前节点的影响、论文被引用量等多维度考量因素。最后,基于热点学科计算机科学某社区近五年的论文数据进行了实验和验证。结果表明,基于社区的挖掘更具有针对性,能够快速定位各学科代表性优秀和潜在人才,且改进后的算法对人才的发现更加客观有效。

关键词: Web Of Science, Neo4j图数据库, PageRank算法, 人才挖掘

CLC Number: