Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (11): 3198-3202.DOI: 10.11772/j.issn.1001-9081.2020040516

• Data science and technology • Previous Articles     Next Articles

Research team mining algorithm based on teacher-student relationship

LI Shasha, LIANG Dongyang, YU Jie, JI Bin, MA Jun, TAN Yusong, WU Qingbo   

  1. School of Computer Science, National University of Defense Technology, Changsha Hunan 410073, China
  • Received:2020-04-23 Revised:2020-07-03 Online:2020-11-10 Published:2020-07-24
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2018YFB1004502).


李莎莎, 梁冬阳, 余杰, 纪斌, 马俊, 谭郁松, 吴庆波   

  1. 国防科技大学 计算机学院, 长沙 410073
  • 通讯作者: 纪斌(1990-),男,山东潍坊人,博士研究生,主要研究方向:信息抽取;
  • 作者简介:李莎莎(1982-),女,湖南长沙人,副教授,博士,主要研究方向:信息抽取、文本摘要、机器学习、知识图谱;梁冬阳(1992-),男,湖南衡阳人,硕士,主要研究方向:信息抽取、知识图谱;余杰(1982-),男,湖南长沙人,研究员,博士,主要研究方向:信息抽取、计算机体系结构、操作系统;马俊(1982-),男,湖南长沙人,副研究员,博士,主要研究方向为:知识图谱、操作系统;谭郁松(1976-),男,湖南长沙人,研究员,博士,主要研究方向为:知识图谱、操作系统、计算机体系结构;吴庆波(1969-),男,湖南长沙人,研究员,博士,主要研究方向为:知识图谱、操作系统、计算机体系结构
  • 基金资助:

Abstract: For mining research teams more rationally, a teacher-student relationship based research team mining algorithm was proposed. First, the BiLSTM-CRF neural network model was used to extract the teacher and classmate named entities from the acknowledgement parts of academic dissertations. Secondly, the guidance and cooperation network between teachers and students was constructed. Thirdly, the Leuven algorithm was improved, and the teacher-student relationship based Leuven algorithm was proposed to mine the research teams. The performance comparison was performed to the label propagation algorithm, the clustering coefficient algorithm and the Leuven algorithm on the datasets such as American College football dataset. Moreover, the operating efficiency of the teacher-student relationship based Leuven algorithm was compared to the operating efficiency of the original Leuven algorithm on three academic dissertation datasets with different scales. Experimental results show that the larger the data size, the more obvious performance improvement of the teacher-student relationship based Leuven algorithm. Finally, based on the academic dissertation dataset of National University of Defense Technology, the performance of the teacher-student relationship based Leuven algorithm was validated. Experimental results show that research teams mined by the proposed algorithm are more reasonable compared to academic paper cooperation network based mining method in the aspects of team cooperation closeness, team scale, team internal relationship and team stability.

Key words: community detection, research team, academic dissertation, teacher-student relationship, data mining

摘要: 为了更合理地挖掘研究团队,提出了一种基于师门关系的研究团队挖掘算法。首先,使用BiLSTM-CRF神经网络模型抽取学位论文致谢部分的师门和同门命名实体;其次,构建师生之间的指导合作关系网络;然后,改进鲁汶算法,提出基于师门关系的鲁汶算法来实现研究团队挖掘。在American College football等数据集上对比了标记传播算法、聚集系数算法与鲁汶算法的性能。此外,在三个不同规模的学位论文数据集上对比基于师门关系的鲁汶算法和原始鲁汶算法的运行效率。实验结果表明,数据规模越大,基于师门关系的鲁汶算法运行效率提升越明显。最后,在国防科技大学学位论文数据集上验证基于师门关系的鲁汶算法的研究团队挖掘性能。实验结果表明,所提算法挖掘的研究团队在团队的合作紧密程度、规模、内部联系和稳定性这四个方面比基于论文合作网络的挖掘方法更为合理。

关键词: 社区发现, 研究团队, 学位论文, 师门关系, 数据挖掘

CLC Number: