Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning

doi:10.11772/j.issn.1001-9081.2024050570

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (3): 785-793.DOI: 10.11772/j.issn.1001-9081.2024050570

• Frontier research and typical applications of large models • Previous Articles Next Articles

Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning

Yan YANG¹, Feng YE¹^,²(), Dong XU²^,³, Xuejie ZHANG¹, Jin XU²^,³^,⁴

^1.College of Computer Science and Software Engineering，Hohai University，Nanjing Jiangsu 211100，China
^2.Key Laboratory of Hydrologic-Cycle and Hydrodynamic-System of Ministry of Water Resources （Hohai University），Nanjing Jiangsu 210024，China
^3.College of Water Conservancy and Hydropower Engineering，Hohai University，Nanjing Jiangsu 210098，China
^4.The National Key Laboratory of Water Disaster Prevention （Hohai University），Nanjing Jiangsu 210098，China

Received:2024-05-09 Revised:2024-08-03 Accepted:2024-08-08 Online:2025-03-17 Published:2025-03-10
Contact: Feng YE
About author:YANG Yan， born in 1999， M. S. candidate. Her research interests include knowledge graph construction， data mining.
XU Dong， born in 1980， Ph. D.， professor. His research interests include water conservancy informatization.
ZHANG Xuejie， born in 1979， Ph. D.， engineer. Her research interests include cloud computing.
XU Jin， born in 1992， Ph. D.， assistant research fellow. His research interests include water conservancy informatization.
Supported by:
National Key Research and Development Program of China(2022YFC3202600);Major Scientific and Technological Project of Ministry of Water Resources(SKS-2022139)

融合大语言模型和提示学习的数字孪生水利知识图谱构建

杨燕¹, 叶枫¹^,²(), 许栋²^,³, 张雪洁¹, 徐津²^,³^,⁴

^1.河海大学计算机与软件学院，南京 211100
^2.水利部水循环与水动力系统重点实验室（河海大学），南京 210024
^3.河海大学水利水电学院，南京 210098
^4.水灾害防御全国重点实验室（河海大学），南京 210098

通讯作者: 叶枫
作者简介:杨燕（1999—），女，江西宜春人，硕士研究生，CCF会员，主要研究方向：知识图谱构建、数据挖掘
许栋（1980—），男，山东单县人，教授，博士，主要研究方向：水利信息化
张雪洁（1979—），女，辽宁铁岭人，工程师，博士，主要研究方向：云计算
徐津（1992—），男，江苏苏州人，助理研究员，博士，主要研究方向：水利信息化。
基金资助:
国家重点研发计划项目(2022YFC3202600);水利部重大科技项目(SKS-2022139)

Abstract

Abstract:

Constructing digital twin water conservancy construction knowledge graph to mine the potential relationships between water conservancy construction objects can help the relevant personnel to optimize the water conservancy construction design scheme and decision-making process. Aiming at the interdisciplinary and complex knowledge structure of digital twin water conservancy construction， and the problems such as insufficient learning and low extraction accuracy of knowledge of general knowledge extraction models in water conservancy domain， a Digital Twin water conservancy construction Knowledge Extraction method based on Large Language Model （DTKE-LLM） was proposed to improve the accuracy of knowledge extraction. In this method， by deploying local Large Language Model （LLM） through LangChain and integrating digital twin water conservancy domain knowledge， prompt learning was used to fine-tune the LLM. In the LLM， semantic understanding and generation capabilities were utilized to extract knowledge. At the same time， a heterogeneous entity alignment strategy was designed to optimize the entity extraction results. Comparison experiments and ablation experiments were carried out on the water conservancy domain corpus to verify the effectiveness of DTKE-LLM. Results of the comparison experiments demonstrate that DTKE-LLM outperforms the deep learning-based BiLSTM-CRF （Bidirectional Long Short-Term Memory Conditional Random Field） named entity recognition model and the general Information extraction model UIE （Universal Information Extraction） in precision. Results of the ablation experiments show that compared with the ChatGLM2-6B （Chat Generative Language Model 2.6 Billion）， DTKE-LLM has the F1 scores of entity extraction and relation extraction improved by 5.5 and 3.2 percentage points respectively. It can be seen that the proposed method realizes the construction of digital twin water conservancy construction knowledge graph on the basis of ensuring the quality of knowledge graph construction.

Key words: Large Language Model (LLM), prompt learning, knowledge graph, knowledge extraction, digital twin water conservancy construction

摘要：

构建数字孪生水利建设知识图谱挖掘水利建设对象之间的潜在关系能够帮助相关人员优化水利建设设计方案和决策。针对数字孪生水利建设的学科交叉和知识结构复杂的特性，以及通用知识抽取模型缺乏对水利领域知识的学习和知识抽取精度不足等问题，为提高知识抽取的精度，提出一种基于大语言模型的数字孪生水利建设知识抽取方法（DTKE-LLM）。该方法通过LangChain部署本地大语言模型（LLM）并集成数字孪生水利领域知识，基于提示学习微调LLM，LLM利用语义理解和生成能力抽取知识，同时，设计异源实体对齐策略优化实体抽取结果。在水利领域语料库上进行对比实验和消融实验，以验证所提方法的有效性。对比实验结果表明，相较于基于深度学习的双向长短期记忆条件随机场（BiLSTM-CRF）命名实体识别模型和通用信息抽取模型UIE（Universal Information Extraction），DTKE-LLM的精确率更优；消融实验结果表明，相较于ChatGLM2-6B（Chat Generative Language Model 2.6 Billion），DTKE-LLM的实体抽取和关系抽取F1值分别提高了5.5和3.2个百分点。可见，该方法在保障知识图谱构建质量的基础上，实现了数字孪生水利建设知识图谱的构建。

关键词: 大语言模型, 提示学习, 知识图谱, 知识抽取, 数字孪生水利建设

CLC Number:

TP391.1

Yan YANG, Feng YE, Dong XU, Xuejie ZHANG, Jin XU. Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning[J]. Journal of Computer Applications, 2025, 45(3): 785-793.

杨燕, 叶枫, 许栋, 张雪洁, 徐津. 融合大语言模型和提示学习的数字孪生水利知识图谱构建[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 785-793.

Figures/Tables 15

References 33

1	黄艳，喻杉，罗斌，等. 面向流域水工程防灾联合智能调度的数字孪生长江探索［J］. 水利学报， 2022， 53（3）： 253-269.
	HUANG Y， YU S， LUO B， et al. Development of the digital twin Changjiang River with the pilot system of joint and intelligent regulation of water projects for flood management ［J］. Journal of Hydraulic Engineering， 2022， 53（3）： 253-269.
2	LI F L， CHEN H， XU G， et al. AliMeKG： domain knowledge graph construction and application in e-commerce ［C］// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. New York： ACM， 2020： 2581-2588.
3	NICHOLSON D N， GREENE C S. Constructing knowledge graphs and their biomedical applications ［J］. Computational and Structural Biotechnology Journal， 2020， 18： 1414-1428.
4	ZEHRA S， MOHSIN S F M， WASI S， et al. Financial knowledge graph based financial report query system ［J］. IEEE Access， 2021， 9： 69766-69782.
5	XU N， MA L， WANG L， et al. Extracting domain knowledge elements of construction safety management： rule-based approach using Chinese natural language processing ［J］. Journal of Management in Engineering， 2021， 37（2）： No.0000870.
6	DHAL P， AZAD C. A comprehensive survey on feature selection in the various fields of machine learning ［J］. Applied Intelligence， 2022， 52（4）： 4543-4581.
7	LUO L， YANG Z， CAO M， et al. A neural network-based joint learning approach for biomedical entity and relation extraction from biomedical literature ［J］. Journal of Biomedical Informatics， 2020， 103： No.103384.
8	张钦彤，王昱超，王鹤羲，等. 大语言模型微调技术的研究综述［J］.计算机工程与应用， 2024， 60（17）：17-33.
	ZHANG Q T， WANG Y C， WANG H X， et al. Comprehensive review of large language model fine-tuning ［J］. Computer Engineering and Applications， 2024， 60（17）：17-33.
9	段浩，韩昆，赵红莉，等. 水利综合知识图谱构建研究［J］. 水利学报， 2021， 52（8）： 948-958.
	DUAN H， HAN K， ZHAO H L， et al. Research on water conservancy comprehensive knowledge graph construction ［J］. Journal of Hydraulic Engineering， 2021， 52（8）： 948-958.
10	冯钧，杭婷婷，陈菊，等. 领域知识图谱研究进展及其在水利领域的应用［J］. 河海大学学报（自然科学版）， 2021， 49（1）： 26-34.
	FENG J， HANG T T， CHEN J， et al. Research status of domain knowledge graph and its application in water conservancy［J］. Journal of Hohai University （Natural Sciences）， 2021， 49（1）： 26-34.
11	WANG Y， YE F， LI B， et al. UrbanFloodKG： an urban flood knowledge graph system for risk assessment ［C］// Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. New York： ACM， 2023： 2574-2584.
12	刘雪梅，卢汉康，李海瑞，等. 知识驱动的水利工程应急方案智能生成方法—以南水北调中线工程为例［J］. 水利学报， 2023， 54（6）： 666-676.
	LIU X M， LU H K， LI H R， et al. A knowledge-driven approach for intelligent generation of hydraulic engineering contingency plans： a case study of the Middle Route of South-to-North Water Diversion Project ［J］. Journal of Hydraulic Engineering， 2023， 54（6）： 666-676.
13	MARTINEZ-RODRIGUEZ J L， HOGAN A， LOPEZ-AREVALO I. Information extraction meets the semantic Web： a survey ［J］. Semantic Web， 2020， 11（2）： 255-335.
14	付雷杰，曹岩，白瑀，等. 国内垂直领域知识图谱发展现状与展望［J］. 计算机应用研究， 2021， 38（11）：3201-3214.
	FU L J， CAO Y， BAI Y， et al. Development status and prospect of vertical domain knowledge graph in China ［J］. Application Research of Computers， 2021， 38（11）： 3201-3214.
15	GENG Z， CHEN G， HAN Y， et al. Semantic relation extraction using sequential and tree-structured LSTM with attention ［J］. Information Sciences， 2020， 509： 183-192.
16	ZHANG S， LI Y， LI S， et al. Bi-LSTM-CRF network for clinical event extraction with medical knowledge features ［J］. IEEE Access， 2022， 10： 110100-110109.
17	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1（Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
18	ROY A， PAN S. Incorporating medical knowledge in BERT for clinical relation extraction ［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2021： 5357-5366.
19	GOEL A， GUETA A， GILON O， et al. LLMs accelerate annotation for medical information extraction ［C］// Proceedings of the 3rd Machine Learning for Health Symposium. New York： JMLR.org， 2023： 82-100.
20	CHODAK G， BŁAŻYCZEK K. Large language models for search engine optimization in e-commerce ［C］// Proceedings of the 2023 International Advanced Computing Conference. Cham： Springer， 2024： 333-344.
21	叶名玮，汤嘉，郭燕，等. 基于大语言模型的命名实体识别［J］. 计算机系统应用， 2024， 33（8）： 257-263.
	YE M W， TANG J， GUO Y， et al. Named entity recognition based on large language model ［J］. Computer Systems and Applications， 2024， 33（8）： 257-263.
22	裴炳森，李欣，蒋章涛，等. 基于大语言模型的公安专业小样本知识抽取方法研究［J］. 计算机科学与探索， 2024， 18（10）： 2630-2642.
	PEI B S， LI X， JIANG Z T， et al. Research on public security professional small sample knowledge extraction method based on large language model ［J］. Journal of Frontiers of Computer Science and Technology， 2024， 18（10）： 2630-2642.
23	彭雪，李正华，张民. 基于语言模型微调的跨领域依存句法分析［J］. 计算机应用与软件， 2022， 39（7）：141-146.
	PENG X， LI Z H， ZHANG M. Cross domain dependency parsing based on fine tuning of language model ［J］. Computer Applications and Software， 2022， 39（7）：141-146.
24	MALLADI S， GAO T， NICHANI E， et al. Fine-tuning language models with just forward passes ［C］// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2023： 53038-53075.
25	MAYER C W F， LUDWIG S， BRANDT S. Prompt text classifications with transformer models！ an exemplary introduction to prompt-based learning with large language models［J］. Journal of Research on Technology in Education， 2023， 55（1）： 125-141.
26	WU H， MA B， LIU W， et al. Fast and constrained absent keyphrase generation by prompt-based learning ［C］// Proceedings of the 26th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2022： 11495-11503.
27	YE F， HUANG L， LIANG S， et al. Decomposed two-stage prompt learning for few-shot named entity recognition ［J］. Information， 2023， 14（5）： No.262.
28	CHEN X， ZHANG N， XIE X， et al. KnowPrompt： knowledge-aware prompt-tuning with synergistic optimization for relation extraction ［C］// Proceedings of the ACM Web Conference 2022. New York： ACM， 2022： 2778-2788.
29	JEONG C. Generative AI service implementation using LLM application architecture： based on RAG model and LangChain framework ［J］. Journal of Intelligence and Information Systems， 2023， 29（4）： 129-164.
30	WANG H， WANG Y， LI J， et al. Degree aware based adversarial graph convolutional networks for entity alignment in heterogeneous knowledge graph ［J］. Neurocomputing， 2022， 487： 99-109.
31	IKOTUN A M， EZUGWU A E， ABUALIGAH L， et al. K-means clustering algorithms： a comprehensive review， variants analysis， and advances in the era of big data ［J］. Information Sciences， 2023， 622： 178-210.
32	刘婧茹，宋阳，贾睿，等. 基于BiLSTM-CRF中文临床文本中受保护的健康信息识别［J］. 数据分析与知识发现， 2020， 4（10）：124-133.
	LIU J R， SONG Y， JIA R， et al. A BiLSTM-CRF model for protected health information in Chinese ［J］. Data Analysis and Knowledge Discovery， 2020， 4（10）： 124-133.
33	FEI H， WU S， LI J， et al. LasUIE： unifying information extraction with latent adaptive structure-aware generative language model ［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 15460-15475.

标准实体	待融合实体
GIS引擎	GIS软件、地理信息系统软件、地理信息系统引擎
新安江模型	新安江水文模型、Xin’anjiang Model
钱塘江流域	钱塘江水域、Qiantang River Basin

标准实体	待融合实体
GIS引擎	GIS软件、地理信息系统软件、地理信息系统引擎
新安江模型	新安江水文模型、Xin’anjiang Model
钱塘江流域	钱塘江水域、Qiantang River Basin

标签编号	标签名称	标记名称	标签编号	标签名称	标记名称
1	物理水利对象	PO	10	引擎	ENG
2	数字孪生对象	DSO	11	模拟	SIM
3	部门	DEPT	12	支持	SUP
4	业务	BIZ	13	包括	INC
5	信息基础设施	INFO	14	实现	IMP
6	平台	PLT	15	依赖	DEP
7	数据底板	DATA	16	执行	EXE
8	模型	MOD	17	来源	ORIG
9	知识	KNOW	18	影响	INF

标签编号	标签名称	标记名称	标签编号	标签名称	标记名称
1	物理水利对象	PO	10	引擎	ENG
2	数字孪生对象	DSO	11	模拟	SIM
3	部门	DEPT	12	支持	SUP
4	业务	BIZ	13	包括	INC
5	信息基础设施	INFO	14	实现	IMP
6	平台	PLT	15	依赖	DEP
7	数据底板	DATA	16	执行	EXE
8	模型	MOD	17	来源	ORIG
9	知识	KNOW	18	影响	INF

任务	模型	精确率	召回率	F1值
实体抽取	BiLSTM-CRF	76.280	71.775	73.959
	UIE-base	82.082	80.077	81.067
	ChatGLM2-6B	84.661	81.572	83.088
	DTKE-ChatGLM2-6B	90.112	87.195	88.630
关系抽取	UIE-base	79.907	70.475	74.895
	ChatGLM2-6B	81.554	80.970	81.261
	DTKE-ChatGLM2-6B	86.125	82.854	84.458

Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning

融合大语言模型和提示学习的数字孪生水利知识图谱构建

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 15

References 33

Related Articles 15

Recommended Articles

Metrics

任务	操作	精确率	召回率	F1值
实体抽取	完整方法	90.112	87.195	88.630
	不注入领域知识	88.291	87.055	87.669
	无知识抽取prompt	85.276	84.332	84.801
	无实体对齐prompt	89.157	86.002	87.551
关系抽取	完整方法	86.125	82.854	84.458
	不注入领域知识	85.245	81.347	83.250
	无知识抽取prompt	82.007	81.256	81.630
	无实体对齐prompt	86.025	82.854	84.410

[1]	Jing HE, Yang SHEN, Runfeng XIE. Recognition and optimization of hallucination phenomena in large language models [J]. Journal of Computer Applications, 2025, 45(3): 709-714.
[2]	Yanmin DONG, Jiajia LIN, Zheng ZHANG, Cheng CHENG, Jinze WU, Shijin WANG, Zhenya HUANG, Qi LIU, Enhong CHEN. Design and practice of intelligent tutoring algorithm based on personalized student capability perception [J]. Journal of Computer Applications, 2025, 45(3): 765-772.
[3]	Xuefei ZHANG, Liping ZHANG, Sheng YAN, Min HOU, Yubo ZHAO. Personalized learning recommendation in collaboration of knowledge graph and large language model [J]. Journal of Computer Applications, 2025, 45(3): 773-784.
[4]	Peng CAO, Guangqi WEN, Jinzhu YANG, Gang CHEN, Xinyi LIU, Xuechun JI. Efficient fine-tuning method of large language models for test case generation [J]. Journal of Computer Applications, 2025, 45(3): 725-731.
[5]	Xiaolin QIN, Xu GU, Dicheng LI, Haiwen XU. Survey and prospect of large language models [J]. Journal of Computer Applications, 2025, 45(3): 685-696.
[6]	Chengzhe YUAN, Guohua CHEN, Dingding LI, Yuan ZHU, Ronghua LIN, Hao ZHONG, Yong TANG. ScholatGPT： a large language model for academic social networks and its intelligent applications [J]. Journal of Computer Applications, 2025, 45(3): 755-764.
[7]	Yuemei XU, Yuqi YE, Xueyi HE. Bias challenges of large language models： identification， evaluation， and mitigation [J]. Journal of Computer Applications, 2025, 45(3): 697-708.
[8]	Meng WANG, Daqian ZHANG, Bingyan ZHOU, Qianying MA, Jidong LYU. Fault diagnosis method for train control on-board interface equipment of CTCS-3 based on temporal knowledge graph completion [J]. Journal of Computer Applications, 2025, 45(2): 677-684.
[9]	Bin LI, Min LIN, Siriguleng, Yingjie GAO, Yurong WANG, Shujun ZHANG. Joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network [J]. Journal of Computer Applications, 2025, 45(1): 75-81.
[10]	Zidong CHENG, Peng LI, Feng ZHU. Potential relation mining in internet of things threat intelligence knowledge graph [J]. Journal of Computer Applications, 2025, 45(1): 24-31.
[11]	Rui LI, Guanfeng LI, Dezhou HU, Wenxin GAO. Knowledge graph multi-hop reasoning model fusing path and subgraph features [J]. Journal of Computer Applications, 2025, 45(1): 32-39.
[12]	Xueqiang LYU, Tao WANG, Xindong YOU, Ge XU. HTLR： named entity recognition framework with hierarchical fusion of multi-knowledge [J]. Journal of Computer Applications, 2025, 45(1): 40-47.
[13]	Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI. Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network [J]. Journal of Computer Applications, 2024, 44(9): 2952-2957.
[14]	Jie WU, Ansi ZHANG, Maodong WU, Yizong ZHANG, Congbao WANG. Overview of research and application of knowledge graph in equipment fault diagnosis [J]. Journal of Computer Applications, 2024, 44(9): 2651-2659.
[15]	Yubo ZHAO, Liping ZHANG, Sheng YAN, Min HOU, Mao GAO. Relation extraction between discipline knowledge entities based on improved piecewise convolutional neural network and knowledge distillation [J]. Journal of Computer Applications, 2024, 44(8): 2421-2429.