基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法

doi:10.11772/j.issn.1001-9081.2023121843

《计算机应用》唯一官方网站

• • 下一篇

基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法

李斌¹,林民²,斯日古楞¹,高颖杰¹,王玉荣¹,张树钧³

1. 内蒙古师范大学
2. 内蒙古师范大学信息工程与计算机学院
3. 内蒙古呼和浩特市赛罕区内蒙古师范大学计算机与信息工程学院

收稿日期:2024-01-04 修回日期:2024-02-28 发布日期:2024-03-11 出版日期:2024-03-11
通讯作者: 林民
基金资助:
内蒙古自然科学基金项目;内蒙古自治区科技计划项目;内蒙古自治区级教育部重点实验室开放课题资助;内蒙古自治区硕士研究生科研创新项目;内蒙古师范大学基本科研业务费专项资金资助;国家自然科学基金资助项目

Entity-relation joint extraction method from Chinese ancient text based on learning and global pointer network

Received:2024-01-04 Revised:2024-02-28 Online:2024-03-11 Published:2024-03-11

摘要/Abstract

摘要： 基于“预训练+微调”范式的实体关系联合抽取方法依赖大规模标注数据，在数据标注难度大、成本高的中文古籍小样本场景下，微调效率低，抽取性能不佳；中文古籍中普遍存在实体嵌套和关系重叠问题，限制了实体关系联合抽取的效果；管道式抽取方法存在错误传播问题，影响抽取准确率。针对以上问题，提出一种基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法。首先，利用区间抽取式阅读理解的提示学习方法，对预训练语言模型（PLM）注入领域知识以统一预训练和微调的优化目标，并对输入句子进行编码表示。其次，使用全局指针网络分别对主、客实体边界和不同关系下的主、客实体边界进行预测并联合解码，对齐成实体关系三元组，完成PTBG（ Tuned BERT with Global Pointer）模型构建，解决了实体嵌套和关系重叠问题，避免了管道式解码的错误传播问题。最后在此基础上分析了不同提示模板对抽取性能的影响。在《史记》数据集上进行实验，较注入领域知识前后的Onerel模型F1值提升了4.71和1.91个百分点。实验结果表明，所提方法能更好地对中文古籍实体关系进行联合抽取，为低资源的小样本深度学习场景提供了新的研究思路与方法。

关键词: 实体关系联合抽取, 全局指针网络, 提示学习, 预训练语言模型, 中文古籍

Abstract: The entity-relation joint extraction methods based on the "pre-training + fine-tuning" paradigm rely on large-scale annotated data. In the small sample scenario of Chinese ancient text where data annotation is difficult and costly, the fine-tuning efficiency is low and the extraction performance is poor; entity nesting and relation overlapping problems are common in Chinese ancient text, which limit the effect of entity-relation joint extraction; the pipeline extraction methods have error propagation problems, which affect the extraction accuracy. In response to the above problems, an entity-relation joint extraction method from Chinese ancient text based on learning and global pointer network was proposed. First, the learning method of span extraction reading comprehension was used to inject domain knowledge into the pre-trained language model (PLM) to unify the optimization goals of pre-training and fine-tuning, and encode the input sentences. Then the global pointer networks were used to predict and jointly decode the boundaries of the subject and object and the boundaries of the subject and object of different relationships, align them into entity-relation triples, and complete the construction of the PTBG ( Tuned BERT with Global Pointer) model. The model solved the problem of entity nesting and relation overlaping, and avoided the error propagation problem of pipeline decoding. Finally, on this basis, the impact of different templates on extraction performance was analyzed. Experiments were conducted on the "Historical Records" dataset. Compared with the Onerel model before and after injected domain knowledge, the F1 value increased by 4.71 and 1.91 percentage points. Experimental results show that the proposed method can better jointly extract entity-relation in ancient Chinese text, and provides new research ideas and methods for low-resource, small-sample deep learning scenarios.

Key words: entity-relation joint extraction, global pointer network, learning, Pre-trained Language Model（PLM）, Chinese ancient text

中图分类号:

TP391

李斌林民斯日古楞高颖杰王玉荣张树钧. 基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2023121843.

[1]	王炫力, 靳小龙, 侯中妮, 廖华明, 张瑾. 基于森林的实体关系联合抽取模型[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2700-2706.
[2]	于碧辉, 蔡兴业, 魏靖烜. 基于提示学习的小样本文本分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2735-2740.
[3]	黄梦林, 段磊, 张袁昊, 王培妍, 李仁昊. 基于Prompt学习的无监督关系抽取模型[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2010-2016.
[4]	高永兵, 高军甜, 马蓉, 杨立东. 用户粒度级的个性化社交文本生成模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1021-1028.
[5]	许亮, 张春, 张宁, 田雪涛. 融合多Prompt模板的零样本关系抽取模型[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3668-3675.
[6]	江静, 陈渝, 孙界平, 琚生根. 融合后验概率校准训练的文本分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1789-1795.
[7]	张海丰, 曾诚, 潘列, 郝儒松, 温超东, 何鹏. 结合BERT和特征投影网络的新闻主题文本分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1116-1124.
[8]	王小鹏, 孙媛媛, 林鸿飞. 基于刑事Electra的编-解码关系抽取模型[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 87-93.
[9]	李志超, 吐尔地·托合提, 艾斯卡尔·艾木都拉. 基于动态注意力和多角度匹配的答案选择模型[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3156-3163.
[10]	吴赛赛, 梁晓贺, 谢能付, 周爱莲, 郝心宁. 面向领域实体关系联合抽取的标注方法[J]. 计算机应用, 2021, 41(10): 2858-2863.
[11]	谭金源, 刁宇峰, 祁瑞华, 林鸿飞. 基于BERT-PGN模型的中文新闻文本自动摘要生成[J]. 计算机应用, 2021, 41(1): 127-132.
[12]	陈玉娜, 史晓东. 通过标点恢复提高机器同传效果[J]. 计算机应用, 2020, 40(4): 972-977.
[13]	李扬, 张伟, 彭晨. 目标依赖的作者身份识别方法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 473-478.
[14]	王月, 王孟轩, 张胜, 杜渂. 基于BERT的警情文本命名实体识别[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 535-540.

基于提示学习和全局指针网络的中文古籍实体关系联合抽取方法

Entity-relation joint extraction method from Chinese ancient text based on learning and global pointer network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics