Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (4): 1169-1176.DOI: 10.11772/j.issn.1001-9081.2024030336

• Artificial intelligence • Previous Articles     Next Articles

Tender information extraction method based on prompt tuning of knowledge

Yiheng SUN1,2, Maofu LIU1,2()   

  1. 1.School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan Hubei 430065,China
    2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System,Wuhan Hubei 430065,China
  • Received:2024-03-27 Revised:2024-07-06 Accepted:2024-07-11 Online:2024-08-30 Published:2025-04-10
  • Contact: Maofu LIU
  • About author:SUN Yiheng, born in 2000, M. S. candidate. His research interests include natural language processing, information retrieval.
  • Supported by:
    “14th Five-Year Plan” Hubei Province Higher Education Institutions’ Advantageous and Characteristic Discipline (Cluster) Project(2023D0302)

基于知识提示微调的标书信息抽取方法

孙熠衡1,2, 刘茂福1,2()   

  1. 1.武汉科技大学 计算机科学与技术学院,武汉 430065
    2.智能信息处理与实时工业系统湖北省重点实验室(武汉科技大学),武汉 430065
  • 通讯作者: 刘茂福
  • 作者简介:孙熠衡(2000—),男,湖北安陆人,硕士研究生,CCF会员,主要研究方向:自然语言处理、信息检索
  • 基金资助:
    “十四五”湖北省高等学校优势特色学科(群)项目(2023D0302)

Abstract:

Current information extraction tasks mainly rely on Large Language Models (LLMs). However, the frequent occurrence of domain terms in tender information and the lack of relevant prior knowledge of the models result in low fine-tuning efficiency and poor extraction performance. Additionally, the extraction and generalization performance of the models depend on the quality of prompt information and the construction way of prompt templates to a great extent. To address these issues, a Tender Information Extraction method based on Prompt Learning (TIEPL) was proposed. Firstly, prompt learning method for generative information extraction was utilized to inject domain knowledge into the LLM, thereby achieving unified optimization of pre-training and fine-tuning stages. Secondly, with the LoRA (Low-Rank Adaptation) fine-tuning method as framework, a prompt training bypass was designed separately, and a prompt template with keywords was designed in the tender scenarios, thereby enhancing the bidirectional association between model information extraction and prompts. Experimental results on a self-built tender inviting and winning dataset indicate that TIEPL improves Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L) and BLEU-4 (BiLingual Evaluation Understudy) by 1.05 and 4.71 percentage points, respectively, compared to the sub-optimal method, UIE(Universal Information Extraction), and TIEPL can generate extraction results more accurately and completely. This demonstrates the effectiveness of the proposed method in improving the accuracy and generalization of tender information extraction.

Key words: generative information extraction, Large Language Model (LLM), prompt learning, LoRA (Low-Rank Adaption) fine-tuning, tender

摘要:

当前信息抽取任务主要依赖大语言模型(LLM),而标书信息中广泛存在领域术语,模型缺乏相关先验知识,导致微调效率低且抽取性能不佳。此外,模型的抽取和泛化性能在很大程度上依赖于提示信息的质量和提示模板的构建方式。针对上述问题,提出一种基于提示学习的标书信息抽取方法(TIEPL)。首先,利用生成式信息抽取的提示学习方法对LLM注入领域知识,以实现预训练和微调阶段的统一优化;其次,以LoRA(Low-Rank Adaption)微调方法为框架,单独设计提示训练旁路,并设计标书场景关键词提示模板,从而增强模型信息抽取与提示的双向关联。在自建的招中标数据集上的实验结果表明,相较于次优的UIE(Universal Information Extraction)方法,TIEPL的ROUGE-L(Recall-Oriented Understudy for Gisting Evaluation)和BLEU-4(BiLingual Evaluation Understudy)分别提高1.05和4.71个百分点,能更准确和完整地生成抽取结果,验证了所提方法在提高标书信息抽取准确性和泛化性方面的有效性。

关键词: 生成式信息抽取, 大语言模型, 提示学习, LoRA微调, 标书

CLC Number: