Journal of Computer Applications
Next Articles
Received:
Revised:
Accepted:
Online:
Published:
Contact:
胡婕1,吴翠1,孙军2,张龑1
通讯作者:
基金资助:
Abstract: In the task of document-level relation extraction, existing models mainly focus on learning the interaction between entities, neglecting the learning of entity internal structures, and paying little attention to the identification of pronoun references and the utilization of logical rules in the document. It leads to the model not being accurate enough in modeling relationships between entities in the document. To address this issue, an anaphor-aware graph is integrated into the Transformer architecture to model the interaction among entities and their internal structures. By leveraging anaphors, more contextual information is aggregated onto corresponding entities to enhance the accuracy of relation extraction. Moreover, a data-driven approach is used to mine logical rules from relation annotations to enhance the understanding and reasoning capability of implicit logical relationships in the text. To solve the problem of sample imbalance, a weighted long-tail loss function is introduced to improve the accuracy of identifying rare relations. The model was validated on two public datasets DocRED and Re-DocRED. The experimental results show that performance of the proposed model is better than those of the comparison models.
Key words: Keywords: document-level relation extraction, anaphor-aware graph, logical rules, sample imbalance, weighted long tail loss function
摘要: 在文档级关系抽取任务中,现有模型主要侧重于学习文档中实体间的交互,忽略了对实体内部结构的学习,并很少关注到文档中的代词指代识别问题和对逻辑规则的应用,导致模型对文档中实体间的关系建模不够准确。对此,该文在基于Transformer的架构上融合关系回指图,建模实体间交互和实体内部结构,利用回指将更多上下文信息聚合到相应实体上来提高关系抽取的准确性。此外,采用数据驱动方式从关系注释中挖掘逻辑规则,增强对文本隐含逻辑关系的理解和推理能力。针对样本不平衡问题,引入了加权长尾损失函数提高对稀有关系的识别准确性。在两个公开数据集DocRED和Re-DocRED上进行了模型验证,实验结果表明所提模型性能优于对比模型。
关键词: 文档级关系抽取, 关系回指图, 逻辑规则, 样本不平衡, 加权长尾损失函数
胡婕 吴翠 孙军 张龑. 基于回指与逻辑推理的文档级关系抽取模型[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2024050676.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024050676