《计算机应用》唯一官方网站

• •    下一篇

基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法

李斌1,林民2,斯日古楞1,高颖杰1,王玉荣1,张树钧3   

  1. 1. 内蒙古师范大学
    2. 内蒙古师范大学信息工程与计算机学院
    3. 内蒙古呼和浩特市赛罕区内蒙古师范大学计算机与信息工程学院
  • 收稿日期:2024-01-04 修回日期:2024-02-28 发布日期:2024-03-11 出版日期:2024-03-11
  • 通讯作者: 林民
  • 基金资助:
    内蒙古自然科学基金项目;内蒙古自治区科技计划项目;内蒙古自治区级教育部重点实验室开放课题资助;内蒙古自治区硕士研究生科研创新项目;内蒙古师范大学基本科研业务费专项资金资助;国家自然科学基金资助项目

Entity-relation joint extraction method from Chinese ancient text based on learning and global pointer network

  • Received:2024-01-04 Revised:2024-02-28 Online:2024-03-11 Published:2024-03-11

摘要: 基于“预训练+微调”范式的实体关系联合抽取方法依赖大规模标注数据,在数据标注难度大、成本高的中文古籍小样本场景下,微调效率低,抽取性能不佳;中文古籍中普遍存在实体嵌套和关系重叠问题,限制了实体关系联合抽取的效果;管道式抽取方法存在错误传播问题,影响抽取准确率。针对以上问题,提出一种基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法。首先,利用区间抽取式阅读理解的提示学习方法,对预训练语言模型(PLM)注入领域知识以统一预训练和微调的优化目标,并对输入句子进行编码表示。其次,使用全局指针网络分别对主、客实体边界和不同关系下的主、客实体边界进行预测并联合解码,对齐成实体关系三元组,完成PTBG( Tuned BERT with Global Pointer)模型构建,解决了实体嵌套和关系重叠问题,避免了管道式解码的错误传播问题。最后在此基础上分析了不同提示模板对抽取性能的影响。在《史记》数据集上进行实验,较注入领域知识前后的Onerel模型F1值提升了4.71和1.91个百分点。实验结果表明,所提方法能更好地对中文古籍实体关系进行联合抽取,为低资源的小样本深度学习场景提供了新的研究思路与方法。

关键词: 实体关系联合抽取, 全局指针网络, 提示学习, 预训练语言模型, 中文古籍

Abstract: The entity-relation joint extraction methods based on the "pre-training + fine-tuning" paradigm rely on large-scale annotated data. In the small sample scenario of Chinese ancient text where data annotation is difficult and costly, the fine-tuning efficiency is low and the extraction performance is poor; entity nesting and relation overlapping problems are common in Chinese ancient text, which limit the effect of entity-relation joint extraction; the pipeline extraction methods have error propagation problems, which affect the extraction accuracy. In response to the above problems, an entity-relation joint extraction method from Chinese ancient text based on learning and global pointer network was proposed. First, the learning method of span extraction reading comprehension was used to inject domain knowledge into the pre-trained language model (PLM) to unify the optimization goals of pre-training and fine-tuning, and encode the input sentences. Then the global pointer networks were used to predict and jointly decode the boundaries of the subject and object and the boundaries of the subject and object of different relationships, align them into entity-relation triples, and complete the construction of the PTBG ( Tuned BERT with Global Pointer) model. The model solved the problem of entity nesting and relation overlaping, and avoided the error propagation problem of pipeline decoding. Finally, on this basis, the impact of different templates on extraction performance was analyzed. Experiments were conducted on the "Historical Records" dataset. Compared with the Onerel model before and after injected domain knowledge, the F1 value increased by 4.71 and 1.91 percentage points. Experimental results show that the proposed method can better jointly extract entity-relation in ancient Chinese text, and provides new research ideas and methods for low-resource, small-sample deep learning scenarios.

Key words: entity-relation joint extraction, global pointer network, learning, Pre-trained Language Model(PLM), Chinese ancient text

中图分类号: