《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (11): 3379-3385.DOI: 10.11772/j.issn.1001-9081.2023101516

• 人工智能 • 上一篇    下一篇

基于证据增强与多特征融合的文档级关系抽取

颜新月, 杨淑群, 高永彬()   

  1. 上海工程技术大学 电子电气工程学院,上海 201620
  • 收稿日期:2023-11-06 修回日期:2024-01-31 接受日期:2024-02-04 发布日期:2024-11-13 出版日期:2024-11-10
  • 通讯作者: 高永彬
  • 作者简介:颜新月(2000—),女,山东临沂人,硕士研究生,主要研究方向:自然语言处理
    杨淑群(1970—),女,江西临川人,教授,博士,主要研究方向:数据分析与数据挖掘、模式识别、人工智能、知识工程
  • 基金资助:
    上海市地方能力建设项目(21010501500);上海市“科技创新行动计划”社会发展科技攻关项目(21DZ1204900)

Document-level relationship extraction based on evidence enhancement and multi-feature fusion

Xinyue YAN, Shuqun YANG, Yongbin GAO()   

  1. School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China
  • Received:2023-11-06 Revised:2024-01-31 Accepted:2024-02-04 Online:2024-11-13 Published:2024-11-10
  • Contact: Yongbin GAO
  • About author:YAN Xinyue, born in 2000, M. S. candidate. Her research interests include natural language processing.
    YANG Shuqun, born in 1970, Ph. D., professor. Her research interests include data analysis and mining, pattern recognition, artificial intelligence, knowledge engineering.

摘要:

文档级关系抽取(DocRE)的目的是识别文档中实体对之间存在的所有关系。针对证据句子和文档信息未能被有效利用以及实体多提及的问题,在使用证据增强上下文特征的基础上,构建一种多特征融合的文档级关系抽取模型EMF(Evidence Multi-feature Fusion)。首先,在实体前后加上实体类型,将关系文本特征与实体提及进行关联,以获得特定于关系的实体特征。其次,通过不同卷积核获得片段表示,并通过注意力机制获得实体对感知的多粒度片段级特征;同时,利用证据分布增强与实体对高度相关的上下文特征。最后,融合以上特征进行关系分类,并在推理时将获得的证据组成伪文档与原文档一起输入分类器进行关系分类。在DocRE数据集DocRED(Document-level Relation Extraction Dataset)上的实验结果表明,使用BERTbase作为预训练语言模型编码器时,相较于先进模型EIDER(EvIDence-Enhanced DocRE),所提模型EMF的Ign F1和F1分别提高了0.42和0.41个百分点,F1达到了62.89%。EMF模型更关注与实体和关系相关的部分,可提高抽取的精度,并具有较好的可解释性。

关键词: 文档级, 关系抽取, 证据, 提及注意, 片段特征

Abstract:

Document-level Relationship Extraction (DocRE) aims at identifying all the relationships that exist between entity pairs in a document. Aiming at the problems of ineffective use of evidence sentences as well as document information, and multiple mentions of entities, a multi-feature fusion DocRE model named EMF (Evidence Multi-feature Fusion) was constructed based on evidence-enhanced contextual features. Firstly, entity types were added before and after entities, and relationship text features were associated with entity mentions to obtain relationship-specific entity features. Secondly, fragment representations were obtained through different convolutional kernels, and multi-granularity fragment-level features perceived by entity pairs were obtained through the attention mechanism. Meanwhile, contextual features highly correlated with the entity pairs were enhanced by using evidence distribution. Finally, the above features were fused for relationship classification, and during inference, the obtained evidence was composed into a pseudo-document and input into the classifier together with the original document for relationship classification. Experimental results on DocRED (Document-level Relation Extraction Dataset), a DocRE dataset, show that when using BERTbase as the PLM encoder, compared with the state-of-the-art model EIDER (EvIDence-Enhanced DocRE), the EMF model has the Ign F1 and F1 improved by 0.42 and 0.41 percentage points respectively, and the F1 reached 62.89%. It can be seen that the EMF model pays more attention to the parts that are related to entities and relationships, improves the extraction accuracy, and has a good interpretability.

Key words: document-level, relationship extraction, evidence, mention attention, fragment feature

中图分类号: