《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (1): 75-81.DOI: 10.11772/j.issn.1001-9081.2023121843

• 人工智能 • 上一篇    下一篇

基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法

李斌1, 林民1(), 斯日古楞null2,3, 高颖杰1, 王玉荣4, 张树钧1,3   

  1. 1.内蒙古师范大学 计算机科学技术学院,呼和浩特 010022
    2.内蒙古民族大学 计算机科学与技术学院,内蒙古 通辽 028000
    3.内蒙古师范大学 文学院,呼和浩特 010022
    4.内蒙古师范大学 数学科学学院,呼和浩特 010022
  • 收稿日期:2024-01-04 修回日期:2024-02-28 接受日期:2024-03-04 发布日期:2024-03-11 出版日期:2025-01-10
  • 通讯作者: 林民
  • 作者简介:李斌(1998—),男,内蒙古乌兰察布人,硕士研究生,CCF会员,主要研究方向:实体关系抽取、自然语言处理;
    斯日古楞(1991—),女(蒙古族),内蒙古通辽人,博士研究生,主要研究方向:计算语言学、自然语言处理;
    高颖杰(1999—),女,内蒙古锡林郭勒人,硕士研究生,主要研究方向:自然语言处理;
    王玉荣(1989—),女(蒙古族),内蒙古呼和浩特人,博士研究生,主要研究方向:机器翻译、自然语言处理;
    张树钧(1979—),男,内蒙古呼和浩特人,博士研究生,主要研究方向:语言文字应用、自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(62266033);内蒙古自然科学基金资助项目(2021LHMS06010);内蒙古自治区科技计划项目(2021GG0218);内蒙古自治区级教育部重点实验室开放课题(2023KFZD03);内蒙古自治区硕士研究生科研创新项目(S20231076Z);内蒙古师范大学基本科研业务费专项资金资助项目(2022JBXC018)

Joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network

Bin LI1, Min LIN1(), Siriguleng2,3, Yingjie GAO1, Yurong WANG4, Shujun ZHANG1,3   

  1. 1.College of Computer Science and Technology,Inner Mongolia Normal University,Hohhot Inner Mongolia 010022,China
    2.College of Computer Science and Technology,Inner Mongolia Minzu University,Tongliao Inner Mongolia 028000,China
    3.College of Language and Literature,Inner Mongolia Normal University,Hohhot Inner Mongolia 010022,China
    4.College of Mathematics Sciences,Inner Mongolia Normal University,Hohhot Inner Mongolia 010022,China
  • Received:2024-01-04 Revised:2024-02-28 Accepted:2024-03-04 Online:2024-03-11 Published:2025-01-10
  • Contact: Min LIN
  • About author:LI Bin, born in 1998, M. S. candidate. His research interests include entity-relation extraction, natural language processing.
    Siriguleng, born in 1991, Ph. D. candidate. Her research interests include computational linguistics, natural language processing.
    GAO Yingjie, born in 1999, M. S. candidate. Her research interests include natural language processing.
    WANG Yurong, born in 1989, Ph. D. candidate. Her research interests include machine translation, natural language processing.
    ZHANG Shujun, born in 1979, Ph. D. candidate. His research interests include applied linguistics, natural language processing.
  • Supported by:
    National Natural science Foundation of China(62266033);Natural Science Foundation of Inner Mongolia(2021LHMS06010);Science and Technology Program of Inner Mongolia(2021GG0218);Open Program of Key Laboratory of Inner Mongolia, Ministry of Education(2023KFZD03);Graduate Student Scientific Research Innovation Project in Inner Mongolia(S20231076Z);Fundamental Research Fund of Inner Mongolia Normal University(2022JBXC018)

摘要:

基于“预训练+微调”范式的实体关系联合抽取方法依赖大规模标注数据,在数据标注难度大、成本高的中文古籍小样本场景下微调效率低,抽取性能不佳;中文古籍中普遍存在实体嵌套和关系重叠的问题,限制了实体关系联合抽取的效果;管道式抽取方法存在错误传播问题,影响抽取效果。针对以上问题,提出一种基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法。首先,利用区间抽取式阅读理解的提示学习方法对预训练语言模型(PLM)注入领域知识以统一预训练和微调的优化目标,并对输入句子进行编码表示;其次,使用全局指针网络分别对主、客实体边界和不同关系下的主、客实体边界进行预测和联合解码,对齐成实体关系三元组,并构建了PTBG (Prompt Tuned BERT with Global pointer)模型,解决实体嵌套和关系重叠问题,同时避免了管道式解码的错误传播问题;最后,在上述工作基础上分析了不同提示模板对抽取性能的影响。在《史记》数据集上进行实验的结果表明,相较于注入领域知识前后的OneRel模型,PTBG模型所取得的F1值分别提升了1.64和1.97个百分点。可见,PTBG模型能更好地对中文古籍实体关系进行联合抽取,为低资源的小样本深度学习场景提供了新的研究思路与方法。

关键词: 实体关系联合抽取, 全局指针网络, 提示学习, 预训练语言模型, 中文古籍

Abstract:

Joint entity-relation extraction methods based on “pre-training + fine-tuning” paradigm rely on large-scale annotated data. In the small sample scenarios of ancient Chinese books where data annotation is difficult and costly, the fine-tuning efficiency is low and the extraction performance is poor; entity nesting and relation overlapping problems are common in ancient Chinese books, which limit the effect of joint entity-relation extraction; pipeline extraction methods have error propagation problems, which affect the extraction effect. In response to the above problems, a joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network was proposed. Firstly, the prompt learning method of span extraction reading comprehension was used to inject domain knowledge into the Pre-trained Language Model (PLM) to unify the optimization goals of pre-training and fine-tuning, and the input sentences were encoded. Then, the global pointer networks were used to predict and jointly decode the boundaries of subject and object and the boundaries of subject and object of different relationships, so as to align into entity-relation triples, and complete the construction of PTBG (Prompt Tuned BERT with Global pointer) model. As the results, the problem of entity nesting and relation overlapping was solved, and the error propagation problem of pipeline decoding was avoided. Finally, based on the above work, the influence of different prompt templates on extraction performance was analyzed. Experimental results on Records of the Grand Historian dataset show that compared with OneRel model before and after injecting domain knowledge, the PTBG model has the F1-value increased by 1.64 and 1.97 percentage points respectively. It can be seen that the PTBG model can better extract entity-relation jointly in ancient Chinese books, and provides new research ideas and approaches for low-resource, small-sample deep learning scenarios.

Key words: joint entity-relation extraction, global pointer network, prompt learning, Pre-trained Language Model (PLM), ancient Chinese books

中图分类号: