《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (5): 1496-1503.DOI: 10.11772/j.issn.1001-9081.2024050676

• 人工智能 • 上一篇    

基于回指与逻辑推理的文档级关系抽取模型

胡婕1,2,3, 吴翠1, 孙军1,2,3(), 张龑1,2,3   

  1. 1.湖北大学 计算机学院,武汉 430062
    2.大数据智能分析与行业应用湖北省重点实验室(湖北大学),武汉 430062
    3.智慧政务与人工智能应用湖北省工程研究中心(湖北大学),武汉 430062
  • 收稿日期:2024-05-27 修回日期:2024-08-28 接受日期:2024-08-30 发布日期:2024-09-05 出版日期:2025-05-10
  • 通讯作者: 孙军
  • 作者简介:胡婕(1977—),女,湖北汉川人,教授,博士,主要研究方向:复杂语义大数据管理、自然语言处理
    吴翠(2000—),女,湖北荆州人,硕士研究生,主要研究方向:自然语言处理
    孙军(1979—),女,湖北枣阳人,讲师,硕士,主要研究方向:自然语言处理
    张龑(1974—),男,湖北宜昌人,教授,博士,CCF会员,主要研究方向:软件工程、信息安全。
  • 基金资助:
    国家自然科学基金资助项目(61977021)

Document-level relation extraction model based on anaphora and logical reasoning

Jie HU1,2,3, Cui WU1, Jun SUN1,2,3(), Yan ZHANG1,2,3   

  1. 1.School of Computer Science,Hubei University,Wuhan Hubei 430062,China
    2.Hubei Key Laboratory of Big Data Intelligent Analysis and Application (Hubei University),Wuhan Hubei 430062,China
    3.Engineering Research Center of Hubei Province in Intelligent Government Affairs and Application of Artificial Intelligence (Hubei University),Wuhan Hubei 430062,China
  • Received:2024-05-27 Revised:2024-08-28 Accepted:2024-08-30 Online:2024-09-05 Published:2025-05-10
  • Contact: Jun SUN
  • About author:HU Jie, born in 1977, Ph. D., professor. Her research interests include complex semantic big data management, natural language processing.
    WU Cui, born in 2000, M. S. candidate. Her research interests include natural language processing.
    SUN Jun, born in 1979, M. S., lecturer. Her research interests include natural language processing.
    ZHANG Yan, born in 1974,Ph. D., professor. His research interests include software engineering, information security.
  • Supported by:
    National Natural Science Foundation of China(61977021)

摘要:

在文档级关系抽取(DocRE)任务中,现有模型主要侧重于学习文档中实体间的交互,忽略了对实体内部结构的学习,并很少关注到文档中的代词指代识别问题以及对逻辑规则的应用,这导致模型对文档中实体间关系的建模不够准确。因此,基于Transformer的架构融合关系回指图,建模实体间交互和实体内部结构,从而利用回指将更多上下文信息聚合到相应实体上以提高关系抽取的准确性。此外,采用数据驱动方式从关系注释中挖掘逻辑规则,增强对文本隐含逻辑关系的理解和推理能力。针对样本不平衡问题,引入加权长尾损失函数提高对稀有关系的识别准确性。在2个公开数据集DocRED(Document-level Relation Extraction Dataset)和Re-DocRED(Revisiting Document-level Relation Extraction Dataset)上的实验结果表明,所提模型性能表现最优,在DocRED测试集上,基于BERT编码器的模型的IgnF1和F1值比基线模型ATLOP(Adaptive Thresholding and Localized cOniext Pooling)分别提高了1.79和2.09个百分点,可见所提模型的综合性能较高。

关键词: 文档级关系抽取, 关系回指图, 逻辑规则, 样本不平衡, 加权长尾损失函数

Abstract:

In Document-level Relation Extraction (DocRE) task, the existing models mainly focus on learning interaction among entities in the document, neglecting the learning of internal structures of entities, and pay little attention to recognition of pronoun references and application of logical rules in the document. The above leads to the model not being accurate enough in modeling relationships among entities in the document. Therefore, an anaphor-aware relation graph was integrated on the basis of the Transformer architecture to model interaction among entities and internal structures of entities. So that, anaphora was used to aggregate more contextual information to the corresponding entities, thereby enhancing relation extraction accuracy. Moreover, a data-driven approach was used to mine logical rules from relation annotations to enhance understanding and reasoning capabilities for implicit logical relationships in the text. To solve the problem of sample imbalance, a weighted long-tail loss function was introduced to improve the accuracy of identifying rare relations. Experiments were conducted on two public datasets DocRED (Document-level Relation Extraction Dataset) and Re?DocRED (Revisiting Document-level Relation Extraction Dataset). The results show that the proposed model has the best performance, when using BERT as encoder, its IgnF1 and F1 values on test set of on DocRED are increased by 1.79 and 2.09 percentage points compared to those of the baseline model ATLOP (Adaptive Thresholding and Localized cOntext Pooling), respectively, validating the high comprehensive performance of the proposed model.

Key words: Document-level Relation Extraction (DocRE), anaphor-aware relation graph, logical rule, sample imbalance, weighted long-tail loss function

中图分类号: