基于大语言模型的中文开放领域实体关系抽取策略

doi:10.11772/j.issn.1001-9081.2024101536

《计算机应用》唯一官方网站

• • 下一篇

基于大语言模型的中文开放领域实体关系抽取策略

龚永罡，陈舒汉^*，廉小亲，李乾生，莫鸿铭，刘宏宇

北京工商大学计算机与人工智能学院，北京 100048

收稿日期:2024-10-30 修回日期:2025-03-20 接受日期:2025-03-27 发布日期:2025-04-21 出版日期:2025-04-21
通讯作者: 陈舒汉
基金资助:
2024北京工商大学研究生教育教学成果培育项目

Entity-relation extraction strategy in Chinese open-domain based on large language model

Received:2024-10-30 Revised:2025-03-20 Accepted:2025-03-27 Online:2025-04-21 Published:2025-04-21

摘要/Abstract

摘要： 大语言模型（LLM）在中文开放领域实体关系抽取任务中存在抽取性能不稳定的问题，对某些特定领域文本和标注类别的识别精准率较低。因此，提出一种基于LLM的中文开放领域实体关系抽取策略——基于LLM多级对话策略（MLDS-LLM）。该策略利用LLM优秀的语义理解和迁移学习能力，通过多轮不同任务的对话实现实体关系抽取。首先，基于开放领域文本结构化逻辑和思维链机制，使用LLM生成文本摘要，避免模型产生关系、事实幻觉和无法兼顾后文信息的问题；其次，通过文本简化策略并引入可替换词表，减少上下文窗口的限制；最后，基于结构化摘要和简化文本构建多级提示模板，使用LLaMA-2-70B模型探究参数temperature对实体关系抽取的影响。测试了LLaMA-2-70B使用所提策略前后实体关系抽取的精准率（P）、召回率（R）、综合性能指数（F1）和精确匹配（EM）。实验结果表明，在CL-NE-DS、DiaKG、CCKS2021、DulE和IEPA这5个不同领域的中文数据集上，所提策略提升了LLM在命名实体识别(NER)和关系抽取(RE)的性能。特别是在专业性强、模型零样本测试结果不佳的DiaKG和IEPA数据集，在应用所提策略后，相较于少样本提示测试，命名实体识别的P值分别提升了9.3%和6.7%，EM值提升2.7%和2.2%；关系抽取的P值分别提升了12.2%和16.0%，F1值则分别提升了10.7%和10.0%。实验结果验证了所提策略能有效提升LLM实体关系抽取的效果并解决模型性能不稳定的问题。

关键词: 大语言模型, 中文开放领域, 命名实体识别, 关系抽取, 提示学习

Abstract: Large Language Model (LLM) faces issues with unstable extraction performance in the task of entity-relation extraction within the Chinese open-domain, particularly showing lower accuracy in recognizing texts and annotated categories from certain specific fields. To address this, a Chinese open-domain entity-relation extraction strategy based on LLM, called Multi-level Dialog Strategies for Large Language Model（MLDS-LLM）, was proposed. First, the superior semantic understanding and transfer learning capabilities of LLM were leveraged to achieve entity-relation extraction through multi-turn dialogues across different tasks. Then, text summaries were generated by using the large language model based on the structured logic of open-domain texts and a chain-of-thought mechanism, thereby avoiding issues such as relational and factual hallucinations and the inability to consider subsequent information. Next, the limitations of the context window were reduced through a text simplification strategy and the introduction of a replaceable vocabulary. Finally, multi-level Prompt templates were constructed based on structured summaries and simplified texts, and explores the impact of the parameter 'temperature' on entity-relation extraction using the LLaMA-2-70B model. The Precision (P), Recall (R), F1 Score (F1), and Exact Match (EM) values of entity-relation extraction by LLaMA-2-70B were tested before and after applying the proposed strategy. Experimental results demonstrate that the proposed strategy enhances the performance of LLM in Named Entity Recognition （NER） and Relation Extraction (RE) across five different Chinese datasets: CL-NE-DS, DiaKG, CCKS2021, DulE, and IEPA. Particularly for the DiaKG and IEPA datasets, which are highly specialized and initially showed poor zero-shot test results, the accuracy of named entity recognition improved by 9.3% and 6.7% respectively compared to few-shot Prompt testing, with EM values increasing by 2.7% and 2.2%. The accuracy of relation extraction improved by 12.2% and 16.0%, and F1 scores increased by 10.7% and 10.0%, proving that the proposed strategy effectively enhances the performance of LLM in entity-relation extraction and resolves the issue of unstable model performance.

Key words: Large Language Model (LLM), Chinese open-domain, Named Entity Recognition (NER), Relation Extraction (RE), prompt learning

中图分类号:

TP391.1

龚永罡陈舒汉廉小亲李乾生莫鸿铭刘宏宇. 基于大语言模型的中文开放领域实体关系抽取策略[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2024101536.

[1]	曾碧卿, 钟广彬, 温志庆. 基于分解式模糊跨度的小样本命名实体识别[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1504-1510.
[2]	胡婕, 吴翠, 孙军, 张龑. 基于回指与逻辑推理的文档级关系抽取模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1496-1503.
[3]	胡婕, 武帅星, 曹芝兰, 张龑. 基于全域信息融合和多维关系感知的命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1511-1519.
[4]	孙熠衡, 刘茂福. 基于知识提示微调的标书信息抽取方法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1169-1176.
[5]	马灿, 黄瑞章, 任丽娜, 白瑞娜, 伍瑶瑶. 基于大语言模型的多输入中文拼写纠错方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 849-855.
[6]	杨燕, 叶枫, 许栋, 张雪洁, 徐津. 融合大语言模型和提示学习的数字孪生水利知识图谱构建[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 785-793.
[7]	李斌, 林民, 斯日古楞null, 高颖杰, 王玉荣, 张树钧. 基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 75-81.
[8]	吕学强, 王涛, 游新冬, 徐戈. 层次融合多元知识的命名实体识别框架——HTLR[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 40-47.
[9]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[10]	孙焕良, 王思懿, 刘俊岭, 许景科. 社交媒体数据中水灾事件求助信息提取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2437-2445.
[11]	毛典辉, 李学博, 刘峻岭, 张登辉, 颜文婧. 基于并行异构图和序列注意力机制的中文实体关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2018-2025.
[12]	游新冬, 问英姿, 佘鑫鹏, 吕学强. 面向煤矿机电设备领域的三元组抽取方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2026-2033.
[13]	唐媛, 陈艳平, 扈应, 黄瑞章, 秦永彬. 基于多尺度混合注意力卷积神经网络的关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2011-2017.
[14]	于右任, 张仰森, 蒋玉茹, 黄改娟. 融合多粒度语言知识与层级信息的中文命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1706-1712.
[15]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.

基于大语言模型的中文开放领域实体关系抽取策略

Entity-relation extraction strategy in Chinese open-domain based on large language model

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics