《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 755-764.DOI: 10.11772/j.issn.1001-9081.2024101477

• 大模型前沿研究与典型应用 • 上一篇    下一篇

ScholatGPT:面向学术社交网络的大语言模型及智能应用

袁成哲1,2, 陈国华2,3(), 李丁丁2,3, 朱源3, 林荣华2,3, 钟昊2,3, 汤庸3,4   

  1. 1.广东技术师范大学 电子与信息学院,广州 510665
    2.人工智能与数字经济广东省实验室(广州),广州 510335
    3.华南师范大学 计算机学院,广州 510631
    4.广东科技学院 数据智能研究院,广东 东莞 523083
  • 收稿日期:2024-10-21 修回日期:2025-02-15 接受日期:2025-02-19 发布日期:2025-03-04 出版日期:2025-03-10
  • 通讯作者: 陈国华
  • 作者简介:袁成哲(1991—),男,湖南汉寿人,讲师,博士,CCF会员,主要研究方向:文本摘要、结构化数据生成、学者知识图谱、数据清洗
    李丁丁(1982—),男,湖南郴州人,教授,博士,CCF会员,主要研究方向:并行与分布式系统、智能系统、IO优化
    朱源(2000—),男,湖南衡阳人,硕士研究生,主要研究方向:计算机系统、大语言模型
    林荣华(1994—),男,广东普宁人,特聘副研究员,博士,CCF会员,主要研究方向:推荐系统、教育大数据、社交网络
    钟昊(1993—),男,广西合浦人,特聘副研究员,博士,CCF会员,主要研究方向:近似算法、社交网络、垂直领域大模型
    汤庸(1964—),男,湖南张家界人,教授,博士,CCF杰出会员,主要研究方向:数据智能、社交网络、学者知识图谱与大模型、教育大数据。
  • 基金资助:
    国家重点研发计划项目(2023YFC3341204);国家自然科学基金(青年基金)资助项目(62407016)

ScholatGPT: a large language model for academic social networks and its intelligent applications

Chengzhe YUAN1,2, Guohua CHEN2,3(), Dingding LI2,3, Yuan ZHU3, Ronghua LIN2,3, Hao ZHONG2,3, Yong TANG3,4   

  1. 1.School of Electronics and Information Engineering,Guangdong Polytechnic Normal University,Guangzhou Guangdong 510665,China
    2.Pazhou Lab,Guangzhou Guangdong 510335,China
    3.School of Computer Science,South China Normal University,Guangzhou Guangdong 510631,China
    4.Institute of Data Intelligence,Guangdong University of Science and Technology,Dongguan Guangdong 523083,China
  • Received:2024-10-21 Revised:2025-02-15 Accepted:2025-02-19 Online:2025-03-04 Published:2025-03-10
  • Contact: Guohua CHEN
  • About author:YUAN Chengzhe, born in 1991, Ph. D., lecturer. His research interests include text summarization, structured data generation, academic knowledge graph, data cleaning.
    LI Dingding, born in 1982, Ph. D., professor. His research interests include parallel and distributed systems, intelligent systems, IO optimization.
    ZHU Yuan, born in 2000, M. S. candidate. His research interests include computer systems, large language model.
    LIN Ronghua, born in 1994, Ph. D., associate research fellow. His research interests include recommendation systems, educational big data, social network.
    ZHONG Hao, born in 1993, Ph. D., associate research fellow. His research interests include approximation algorithm, social network, large language model for vertical domains.
    TANG Yong, born in 1964, Ph. D., professor. His research interests include data intelligence, social network, scholar knowledge graph and large language model, educational big data.
  • Supported by:
    National Key Research and Development Program of China(2023YFC3341204);Young Scientists Fund of National Natural Science Foundation of China(62407016)

摘要:

针对现有大语言模型(LLM)在跨领域知识处理、实时学术信息更新及输出质量保证方面的局限,提出基于学术社交网络(ASN)的学者LLM——ScholatGPT。ScholatGPT结合知识图谱增强生成(KGAG)与检索增强生成(RAG),以提升精准语义检索与动态知识更新的能力,并通过微调优化以强化学术文本的生成质量。首先,基于学者网(SCHOLAT)关系数据构建学者知识图谱,并利用LLM进行语义增强;其次,提出KGAG检索模型,结合RAG实现多路混合检索,增强LLM的精准检索能力;最后,利用微调技术优化模型,使它在各学术领域的生成质量得到提升。实验结果表明,ScholatGPT在学术问答任务中的精确率达83.2%,相较于GPT-4o和AMiner AI提升了69.4和11.5个百分点,在学者画像、代表作识别和研究领域分类等任务上均表现优异。在回答相关性、连贯性和可读性方面,ScholatGPT取得了稳定且具有竞争力的表现,在专业性与可读性之间实现了较好的平衡。此外,基于ScholatGPT开发的学者智库和学术信息推荐系统等智能应用有效提升了学术信息获取的效率。

关键词: 大语言模型, 学术社交网络, 知识图谱, 知识注入, 学者网

Abstract:

To address the limitations of the existing Large Language Models (LLMs) in processing cross-domain knowledge, updating real-time academic information, and ensuring output quality, ScholatGPT, a scholar LLM based on Academic Social Networks (ASNs), was proposed. In ScholatGPT, the abilities of precise semantic retrieval and dynamic knowledge update were enhanced by integrating Knowledge-Graph Augmented Generation (KGAG) and Retrieval-Augmented Generation (RAG), and optimization and fine-tuning were used to improve the generation quality of academic text. Firstly, a scholar knowledge graph was constructed based on relational data from SCHOLAT, with LLMs employed to enrich the graph semantically. Then, a KGAG-based retrieval model was introduced, combined with RAG to realize multi-path hybrid retrieval, thereby enhancing the model’s precision in search. Finally, fine-tuning techniques were applied to optimize the model’s generation quality in academic fields. Experimental results demonstrate that ScholatGPT achieves the precision of 83.2% in academic question answering tasks, outperforming GPT-4o and AMiner AI by 69.4 and 11.5 percentage points, and performs well in all the tasks such as scholar profiling, representative work identification, and research field classification. Furthermore, ScholatGPT obtains stable and competitive results in answer relevance, coherence, and readability, achieving a good balance between specialization and readability. Additionally, ScholatGPT-based intelligent applications such as scholar think tank and academic information recommendation system improve academic resource acquisition efficiency effectively.

Key words: Large Language Model (LLM), Academic Social Network (ASN), knowledge graph, knowledge injection, SCHOLAT

中图分类号: