《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2700-2706.DOI: 10.11772/j.issn.1001-9081.2022091419

• 2022第10届CCF大数据学术会议 • 上一篇    下一篇

基于森林的实体关系联合抽取模型

王炫力1,2, 靳小龙1,2(), 侯中妮1,2, 廖华明1,2, 张瑾1,2   

  1. 1.中国科学院网络数据科学与技术重点实验室(中国科学院计算技术研究所),北京 100190
    2.中国科学院大学,北京 100049
  • 收稿日期:2022-09-21 修回日期:2022-11-07 接受日期:2022-11-14 发布日期:2023-02-24 出版日期:2023-09-10
  • 通讯作者: 靳小龙
  • 作者简介:王炫力(1996—),女,安徽亳州人,硕士,CCF会员,主要研究方向:知识图谱
    侯中妮(1996—),女,山东青岛人,博士研究生,主要研究方向:知识图谱、事理图谱
    廖华明(1972—),女,四川成都人,副教授,博士,主要研究方向:大数据应用、信息检索、分布式数据处理
    张瑾(1978—),男,湖北应城人,高级工程师,博士,主要研究方向:舆情分析、自然语言处理、大数据处理。

Forest-based entity-relation joint extraction model

Xuanli WANG1,2, Xiaolong JIN1,2(), Zhongni HOU1,2, Huaming LIAO1,2, Jin ZHANG1,2   

  1. 1.Key Laboratory of Network Data Science and Technology,Chinese Academy of Sciences (Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190,China
    2.University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2022-09-21 Revised:2022-11-07 Accepted:2022-11-14 Online:2023-02-24 Published:2023-09-10
  • Contact: Xiaolong JIN
  • About author:WANG Xuanli, born in 1996, M. S. Her research interests include knowledge graph.
    HOU Zhongni, born in 1996, Ph. D., candidate. Her research interests include knowledge graph, event logic graph.
    LIAO Huaming, born in 1972, Ph. D., associate professor. Her research interests include big data application, information retrieval, distributed data processing.
    ZHANG Jin, born in 1978, Ph. D., senior engineer. His research interests include public opinion analysis, natural language processing, big data processing.

摘要:

嵌套实体对实体关系联合提取任务提出了挑战。现有的联合抽取模型在处理嵌套实体时存在产生大量负例且复杂度高的问题,此外未考虑嵌套实体对三元组预测的干扰。针对以上问题,提出一种基于森林的实体关系联合抽取方法——EF2LTF(Entity Forest to Layering Triple Forest)。EF2LTF采用了一个两阶段的联合训练框架,首先通过生成实体森林灵活地在嵌套实体内部识别不同的实体;然后结合已识别出的嵌套实体及其层次结构生成分层的三元组森林。在四个标准数据集上的实验结果表明,与基于集合预测网络的SPN(Set Prediction Network)模型、基于跨度的实体关系联合抽取模型SpERT(Span-based Entity and Relation Transformer)和动态图增强信息抽取(DyGIE++)等方法相比,所提方法取得了最优的F1值。说明所提方法既增强了嵌套实体的识别能力,也增强了构建三元组时对嵌套实体的分辨能力,从而提升了实体与关系的联合抽取性能。

关键词: 实体关系联合抽取, 三元组生成, 嵌套实体, 分层预测, 实体森林

Abstract:

Nested entities pose a challenge to the task of entity-relation joint extraction. The existing joint extraction models have the problems of generating a large number of negative examples and high complexity when dealing with nested entities. In addition, the interference of nested entities on triplet prediction is not considered by these models. To solve these problems, a forest-based entity-relation joint extraction method was proposed, named EF2LTF (Entity Forest to Layering Triple Forest). In EF2LTF, a two-stage joint training framework was adopted. Firstly, through the generation of an entity forest, different entities within specific nested entities were identified flexibly. Then, the identified nested entities and their hierarchical structures were combined to generate a hierarchical triplet forest. Experimental results on four benchmark datasets show that EF2LTF outperforms methods such as joint entity and relation extraction with Set Prediction Network (SPN) model, joint extraction model for entities and relations based on Span — SpERT (Span-based Entity and Relation Transformer) and Dynamic Graph Information Extraction ++ (DyGIE++)on F1 score. It is verified that the proposed method not only enhances the recognition ability of nested entities, but also enhances the ability to distinguish nested entities when constructing triples, thereby improving the joint extraction performance of entities and relations.

Key words: entity-relation joint extraction, triplet generation, nested entity, layering prediction, entity forest

中图分类号: