Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Entity-relation extraction strategy in Chinese open-domains based on large language model
Yonggang GONG, Shuhan CHEN, Xiaoqin LIAN, Qiansheng LI, Hongming MO, Hongyu LIU
Journal of Computer Applications    2025, 45 (10): 3121-3130.   DOI: 10.11772/j.issn.1001-9081.2024101536
Abstract151)   HTML0)    PDF (3025KB)(105)       Save

Large Language Models (LLMs) face issues of unstable extraction performance in Entity-Relation Extraction (ERE) tasks in Chinese open-domains, and have low precision in recognizing texts and annotated categories in certain specific fields. Therefore, a Chinese open-domain entity-relation extraction strategy based on LLM, called Multi-Level Dialog Strategy for Large Language Model (MLDS-LLM), was proposed. In the strategy, the superior semantic understanding and transfer learning capabilities of LLMs were used to achieve entity-relation extraction through multi-turn dialogues of different tasks. Firstly, structured summaries were generated by using LLM based on the structured logic of open-domain text and a Chain-of-Thought (CoT) mechanism, thereby avoiding relational and factual hallucinations generated by model as well as the problem of inability to consider subsequent information. Then, the limitations of the context window were reduced through the use of a text simplification strategy and the introduction of a replaceable vocabulary. Finally, multi-level prompt templates were constructed on the basis of structured summaries and simplified texts, the influence of the parameter temperature on ERE was explored using LLaMA-2-70B model, and the Precision, Recall, F1 value (F1), and Exact Match (EM) values of entity-relation extraction by LLaMA-2-70B model were tested before and after applying the proposed strategy. Experimental results demonstrate that the proposed strategy enhances the performance of LLM in Named Entity Recognition (NER) and Relation Extraction (RE) on five different domain Chinese datasets such as CL-NE-DS, DiaKG, and CCKS2021. Particularly on the DiaKG and IEPA datasets, which are highly specialized with poor zero-shot test results of model, compared to few-shot prompt test, the model has the precision of NER improved by 9.3 and 6.7 percentage points respectively with EM values increased by 2.7 and 2.2 percentage points respectively, and has the precision of RE improved by 12.2 and 16.0 percentage points respectively with F1 values increased by 10.7 and 10.0 percentage points respectively, proving that the proposed strategy enhances performance of LLM in ERE effectively and solves problem of unstable model performance.

Table and Figures | Reference | Related Articles | Metrics