Journal of Computer Applications

    Next Articles

Construction of digital twin water conservancy knowledge graph integrating large language model and learning

YANG Yan, YE Feng, XU Dong, ZHANG Xuejie, XU Jing   

  • Received:2024-05-07 Revised:2024-07-29 Online:2024-08-26 Published:2024-08-26
  • Contact: feng ye

融合大语言模型和提示学习的数字孪生水利知识图谱构建

杨燕,叶枫,许栋,张雪洁,徐津   

  1. 河海大学
  • 通讯作者: 叶枫
  • 基金资助:
    国家重点研发计划项目;国家自然科学基金;江苏省水利科技项目;中央高校基本科研业务费专项资金资助

Abstract: Constructing digital twin water conservancy construction knowledge graph to excavate the potential relationship between water conservancy construction objects can help the relevant personnel to optimise the water construction design scheme and decision-making. Aiming at the characteristics of digital twin water conservancy construction with strong discipline intersections and complex knowledge structure, as well as the lack of learning of water conservancy domain knowledge in the general knowledge extraction model and insufficient knowledge extraction accuracy, a digital twin water conservancy domain knowledge extraction method based on a large language model is proposed in order to improve the accuracy of knowledge extraction. The method deploys a local large language model and integrates digital twin water conservancy domain knowledge through LangChain, fine-tunes the big language model based on learning, the large language model extracts knowledge using semantic understanding and generative capabilities, and optimises the entity extraction results by designing heterogeneous entity alignment strategies. Comparison experiments and ablation experiments are conducted on a water domain corpus to verify the effectiveness of the proposed method. The results of the comparison experiments show that compared to the deep learning-based BiLSTM-CRF named entity recognition model and the UIE model, the proposed method has better accuracy, with F1 scores of 88.63% and 84.46% for entity extraction and relationship extraction, respectively, and the accuracy rate of entity extraction reaches 90.11%. The results of the ablation experiments show that the proposed method improves the F1 scores of entity extraction and relation extraction by 5.5 percentage points and 3.2 percentage points, respectively, compared with the LLM baseline model. Therefore, the method achieves the construction of digital twin water conservancy construction knowledge graph on the basis of guaranteeing the quality of knowledge graph construction.

Key words: large language model, prompt learning, knowledge graph, knowledge extraction, digital twin water conservancy construction

摘要: 构建数字孪生水利建设知识图谱挖掘水利建设对象之间的潜在关系,能够帮助相关人员优化水利建设设计方案和决策。针对数字孪生水利建设学科交叉性强,知识结构复杂的特性,以及通用知识抽取模型缺乏对水利领域知识的学习,知识抽取精度不足的问题,为了提高知识抽取的精度,提出一种基于大语言模型的数字孪生水利领域知识抽取方法。该方法通过LangChain部署本地大语言模型并集成数字孪生水利领域知识,基于提示学习微调大语言模型,大语言模型利用语义理解和生成能力抽取知识,并设计异源实体对齐策略,优化实体抽取结果。在水利领域语料库上进行对比实验和消融实验,以验证所提方法的有效性。对比实验结果表明,相较于基于深度学习的BiLSTM-CRF命名实体识别模型和UIE模型,所提方法的精度更优,实体抽取和关系抽取的F1分数分别达到88.63%和84.46%,并且实体抽取的精确率达到了90.11%。消融实验结果表明,所提方法相较于LLM基线模型,实体抽取和关系抽取F1分数分别提高了5.5个百分点和3.2个百分点。因此,该方法在保障知识图谱构建质量的基础上,实现了数字孪生水利建设知识图谱的构建。

关键词: 大语言模型, 提示学习, 知识图谱, 知识抽取, 数字孪生水利建设

CLC Number: