《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 785-793.DOI: 10.11772/j.issn.1001-9081.2024050570

• 大模型前沿研究与典型应用 • 上一篇    下一篇

融合大语言模型和提示学习的数字孪生水利知识图谱构建

杨燕1, 叶枫1,2(), 许栋2,3, 张雪洁1, 徐津2,3,4   

  1. 1.河海大学 计算机与软件学院,南京 211100
    2.水利部水循环与水动力系统重点实验室(河海大学),南京 210024
    3.河海大学 水利水电学院,南京 210098
    4.水灾害防御全国重点实验室(河海大学),南京 210098
  • 收稿日期:2024-05-09 修回日期:2024-08-03 接受日期:2024-08-08 发布日期:2025-03-17 出版日期:2025-03-10
  • 通讯作者: 叶枫
  • 作者简介:杨燕(1999—),女,江西宜春人,硕士研究生,CCF会员,主要研究方向:知识图谱构建、数据挖掘
    许栋(1980—),男,山东单县人,教授,博士,主要研究方向:水利信息化
    张雪洁(1979—),女,辽宁铁岭人,工程师,博士,主要研究方向:云计算
    徐津(1992—),男,江苏苏州人,助理研究员,博士,主要研究方向:水利信息化。
  • 基金资助:
    国家重点研发计划项目(2022YFC3202600);水利部重大科技项目(SKS-2022139)

Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning

Yan YANG1, Feng YE1,2(), Dong XU2,3, Xuejie ZHANG1, Jin XU2,3,4   

  1. 1.College of Computer Science and Software Engineering,Hohai University,Nanjing Jiangsu 211100,China
    2.Key Laboratory of Hydrologic-Cycle and Hydrodynamic-System of Ministry of Water Resources (Hohai University),Nanjing Jiangsu 210024,China
    3.College of Water Conservancy and Hydropower Engineering,Hohai University,Nanjing Jiangsu 210098,China
    4.The National Key Laboratory of Water Disaster Prevention (Hohai University),Nanjing Jiangsu 210098,China
  • Received:2024-05-09 Revised:2024-08-03 Accepted:2024-08-08 Online:2025-03-17 Published:2025-03-10
  • Contact: Feng YE
  • About author:YANG Yan, born in 1999, M. S. candidate. Her research interests include knowledge graph construction, data mining.
    XU Dong, born in 1980, Ph. D., professor. His research interests include water conservancy informatization.
    ZHANG Xuejie, born in 1979, Ph. D., engineer. Her research interests include cloud computing.
    XU Jin, born in 1992, Ph. D., assistant research fellow. His research interests include water conservancy informatization.
  • Supported by:
    National Key Research and Development Program of China(2022YFC3202600);Major Scientific and Technological Project of Ministry of Water Resources(SKS-2022139)

摘要:

构建数字孪生水利建设知识图谱挖掘水利建设对象之间的潜在关系能够帮助相关人员优化水利建设设计方案和决策。针对数字孪生水利建设的学科交叉和知识结构复杂的特性,以及通用知识抽取模型缺乏对水利领域知识的学习和知识抽取精度不足等问题,为提高知识抽取的精度,提出一种基于大语言模型的数字孪生水利建设知识抽取方法(DTKE-LLM)。该方法通过LangChain部署本地大语言模型(LLM)并集成数字孪生水利领域知识,基于提示学习微调LLM,LLM利用语义理解和生成能力抽取知识,同时,设计异源实体对齐策略优化实体抽取结果。在水利领域语料库上进行对比实验和消融实验,以验证所提方法的有效性。对比实验结果表明,相较于基于深度学习的双向长短期记忆条件随机场(BiLSTM-CRF)命名实体识别模型和通用信息抽取模型UIE(Universal Information Extraction),DTKE-LLM的精确率更优;消融实验结果表明,相较于ChatGLM2-6B(Chat Generative Language Model 2.6 Billion),DTKE-LLM的实体抽取和关系抽取F1值分别提高了5.5和3.2个百分点。可见,该方法在保障知识图谱构建质量的基础上,实现了数字孪生水利建设知识图谱的构建。

关键词: 大语言模型, 提示学习, 知识图谱, 知识抽取, 数字孪生水利建设

Abstract:

Constructing digital twin water conservancy construction knowledge graph to mine the potential relationships between water conservancy construction objects can help the relevant personnel to optimize the water conservancy construction design scheme and decision-making process. Aiming at the interdisciplinary and complex knowledge structure of digital twin water conservancy construction, and the problems such as insufficient learning and low extraction accuracy of knowledge of general knowledge extraction models in water conservancy domain, a Digital Twin water conservancy construction Knowledge Extraction method based on Large Language Model (DTKE-LLM) was proposed to improve the accuracy of knowledge extraction. In this method, by deploying local Large Language Model (LLM) through LangChain and integrating digital twin water conservancy domain knowledge, prompt learning was used to fine-tune the LLM. In the LLM, semantic understanding and generation capabilities were utilized to extract knowledge. At the same time, a heterogeneous entity alignment strategy was designed to optimize the entity extraction results. Comparison experiments and ablation experiments were carried out on the water conservancy domain corpus to verify the effectiveness of DTKE-LLM. Results of the comparison experiments demonstrate that DTKE-LLM outperforms the deep learning-based BiLSTM-CRF (Bidirectional Long Short-Term Memory Conditional Random Field) named entity recognition model and the general Information extraction model UIE (Universal Information Extraction) in precision. Results of the ablation experiments show that compared with the ChatGLM2-6B (Chat Generative Language Model 2.6 Billion), DTKE-LLM has the F1 scores of entity extraction and relation extraction improved by 5.5 and 3.2 percentage points respectively. It can be seen that the proposed method realizes the construction of digital twin water conservancy construction knowledge graph on the basis of ensuring the quality of knowledge graph construction.

Key words: Large Language Model (LLM), prompt learning, knowledge graph, knowledge extraction, digital twin water conservancy construction

中图分类号: