计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2517-2522.DOI: 10.11772/j.issn.1001-9081.2020111842

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

基于头实体注意力的实体关系联合抽取方法

刘雅璇1,2, 钟勇1,2   

  1. 1. 中国科学院 成都计算机应用研究所, 成都 610041;
    2. 中国科学院大学, 北京 100049
  • 收稿日期:2020-11-24 修回日期:2021-03-24 出版日期:2021-09-10 发布日期:2021-05-12
  • 通讯作者: 钟勇
  • 作者简介:刘雅璇(1997-),女,江西上饶人,硕士研究生,主要研究方向:自然语言处理、大数据;钟勇(1966-),男,四川岳池人,研究员,博士,CCF会员,主要研究方向:大数据、人工智能、软件过程。
  • 基金资助:
    四川省科技成果转移转化平台项目(2020ZHCG0002)。

Joint extraction method of entities and relations based on subject attention

LIU Yaxuan1,2, ZHONG Yong1,2   

  1. 1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-11-24 Revised:2021-03-24 Online:2021-09-10 Published:2021-05-12
  • Supported by:
    This work is partially supported by the Science and Technology Achievement Transformation Platform Program of Sichuan Province (2020ZHCG0002).

摘要: 实体关系抽取是构建大规模知识图谱及各种信息抽取任务的关键步骤。基于预训练语言模型,提出基于头实体注意力的实体关系联合抽取方法。该方法采用卷积神经网络(CNN)提取头实体关键信息,并采用注意力机制捕获头实体与尾实体之间的依赖关系,构建了基于头实体注意力的联合抽取模型(JSA)。在公共数据集纽约时报语料库(NYT)和采用远程监督方法构建的人工智能领域数据集上进行实验,所提模型的F1值相较于级联二元标记框架(CasRel)分别获得了1.8和8.9个百分点的提升。

关键词: 实体关系抽取, 联合抽取, 自然语言处理, 注意力机制, 领域知识图谱, 头实体

Abstract: Extracting entities and relations is crucial for building large-scale knowledge graph and different knowledge extraction tasks. Based on the pre-trained language model, an entity-oriented joint extraction method combining subject attention was proposed. In this method, the key information of the subject was extracted by using Convolutional Neural Network (CNN) and the dependency relationship between the subject and the object was captured by the attention mechanism. Followed by the above, a Joint extraction model based on Subject Attention (JSA) was built. In experiments on public dataset New York Times corpus (NYT) and the dataset of artificial intelligence built by distant supervision, the F1 score of the proposed model was improved by 1.8 and 8.9 percentage points respectively compared with Cascade binary tagging framework for Relational triple extraction (CasRel).

Key words: entity and relation extraction, joint extraction, Natural Language Processing (NLP), attention mechanism, domain knowledge graph, subject

中图分类号: