Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (6): 1809-1816.DOI: 10.11772/j.issn.1001-9081.2024050682

• Artificial intelligence • Previous Articles    

Document-level relation extraction based on entity representation enhancement

Haijie WANG(), Guangxin ZHANG, Hai SHI, Shu CHEN   

  1. School of Internet of Things Engineering,Jiangnan University,Wuxi Jiangsu 214122,China
  • Received:2024-05-30 Revised:2024-09-01 Accepted:2024-09-13 Online:2024-09-18 Published:2025-06-10
  • Contact: Haijie WANG
  • About author:WANG Haijie, born in 2000, M. S. candidate. His research interests include natural language processing, relation extraction.
    ZHANG Guangxin, born in 1999, M. S. candidate. His research interests include natural language processing, relation extraction.
    SHI Hai, born in 2000, M. S. candidate. His research interests include data processing, time series analysis.
    CHEN Shu, born in 1969, Ph. D., associate professor. His research interests include machine learning, natural language processing.

基于实体表示增强的文档级关系抽取

王海杰(), 张广鑫, 史海, 陈树   

  1. 江南大学 物联网工程学院,江苏 无锡 214122
  • 通讯作者: 王海杰
  • 作者简介:王海杰(2000—),男,安徽蚌埠人,硕士研究生,主要研究方向:自然语言处理、关系抽取 6221905044@stu.jiangnan.edu.cn
    张广鑫(1999—),男,江苏无锡人,硕士研究生,主要研究方向:自然语言处理、关系抽取
    史海(2000—),男,河南南阳人,硕士研究生,主要研究方向:数据处理、时间序列分析
    陈树(1969—),男,江苏盐城人,副教授,博士,主要研究方向为:机器学习、自然语言处理。

Abstract:

Aiming at problems of ignoring entity mention differences and lack of complexity calculation paradigm for entity-pair relation extraction in the existing entity representation learning for Document-level Relation Extraction (DocRE) tasks, a DocRE model based on Entity Representation Enhancement (DREERE) was proposed. Firstly, an attention mechanism was used to evaluate the differences of entity mentions in determining different entity-pair relations, so as to obtain more flexible entity representations. Secondly, the entity-pair sentence importance distribution computed by the encoder was used to evaluate the complexity of entity-pair relation extraction, and the two-hop information among entity-pairs was used selectively to enhance entity-pair representations. Experiments were carried out on the popular datasets DocRED, Re-DocRED and DWIE. The results show that DREERE model improves the F1 value by 0.06, 0.14, and 0.23 percentage points, respectively, and the ign-F1 (F1 score calculated by ignoring the triples that appear in the training set) value by 0.07, 0.09 and 0.12 percentage points, respectively, compared to the optimal baseline models such as ATLOP (Adaptive Thresholding and Localized cOntext Pooling) and E2GRE (Entity and Evidence Guided Relation Extraction), indicating that DREERE model is able to acquire semantic information of entities in documents effectively.

Key words: Document-level Relation Extraction (DocRE), attention mechanism, Evidence Retrieval (ER), representation learning, two-hop information

摘要:

针对现有的文档级关系抽取(DocRE)任务的实体表示学习存在的忽视实体提及差异性和缺少实体对关系抽取复杂度的计算范式的问题,提出一种基于实体表示增强的DocRE模型(DREERE)。首先,利用注意力机制评估实体提及在判定不同实体对关系时的差异性,得到更灵活的实体表示;其次,利用编码器计算得到的实体对句子重要性分布评估实体对关系抽取的复杂度,再选择性地利用实体对之间的两跳信息增强实体对的表示;最后,在3个流行的数据集DocRED、Re-DocRED和DWIE上进行实验。结果显示,与最优基线模型(如ATLOP(Adaptive Thresholding and Localized cOntext Pooling)、E2GRE(Entity and Evidence Guided Relation Extraction))相比,DREERE的F1值分别提高了0.06、0.14和0.23个百分点,忽略训练集出现的三元组而计算得到的F1分数(ign-F1)值分别提高了0.07、0.09和0.12个百分点,可见该模型能够有效获取文档里的实体语义信息。

关键词: 文档级关系抽取, 注意力机制, 证据搜索, 表示学习, 两跳信息

CLC Number: