Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (1): 145-149.DOI: 10.11772/j.issn.1001-9081.2020061008

Special Issue: 第八届中国数据挖掘会议(CCDM 2020)

• China Conference on Data Mining 2020 (CCDM 2020) • Previous Articles     Next Articles

Entity relation extraction method for guidelines of cardiovascular disease based on bidirectional encoder representation from transformers

WU Xiaoping1, ZHANG Qiang1, ZHAO Fang2, JIAO Lin2   

  1. 1. School of Computer Science, Wuhan University, Wuhan Hubei 430072, China;
    2. Department of Cardiovascular Disease, Zhongnan Hospital of Wuhan University, Wuhan Hubei 430070, China
  • Received:2020-05-31 Revised:2020-09-09 Online:2021-01-10 Published:2020-09-17
  • Supported by:
    This work is partially supported by the Scientific Research Project of Hubei Provincial Health Committee (WJ2019M208).

基于BERT的心血管医疗指南实体关系抽取方法

武小平1, 张强1, 赵芳2, 焦琳2   

  1. 1. 武汉大学 计算机学院, 武汉 430072;
    2. 武汉大学中南医院 心血管内科, 武汉 430070
  • 通讯作者: 赵芳
  • 作者简介:武小平(1974-),男,湖北钟祥人,副教授,博士,CCF会员,主要研究方向:知识图谱、物联网大数据;张强(1995-),男,江西上饶人,硕士研究生,主要研究方向:知识图谱;赵芳(1977-),女,湖北鄂州人,副主任医师,博士,主要研究方向:心血管疾病发病机制、心血管疾病大数据管理;焦琳(1989-),女,湖北武汉人,住院医师,硕士,主要研究方向:心血管疾病大数据管理。
  • 基金资助:
    湖北省卫生健康委员会面上项目(WJ2019M208)。

Abstract: Entity relation extraction is a critical basic step of question answering, knowledge graph construction and information extraction in the medical field. In view of the fact that there is no open dataset available in the process of building knowledge graph specialized for cardiovascular disease, a professional training set for entity relation extraction of specialized cardiovascular disease knowledge graph was constructed by collecting some medical guidelines for cardiovascular disease and performing the corresponding professional labeling of the categories of entities and relations. Based on this dataset, firstly, Bidirectional Encoder Representation from Transformers and Convolutional Neural Network (BERT-CNN) model was proposed to realize the relation extraction in Chinese corpus. Then, the improved Bidirectional Encoder Representation from Transformers and Convolutional Neural Networks based on whole word mask (BERT(wwm)-CNN) model was proposed to improve the performance of relation extraction in Chinese corpus, according to the fact that word instead of character is the fundamental unit in Chinese. Experimental results show that, the improved BERT(wwm)-CNN model has the accuracy of 0.85, the recall of 0.80 and the F1 value of 0.83 on the constructed relation extraction dataset, which are better than those of the comparison models, Bidirectional Encoder Representation from Transformers and Long Short Term Memory (BERT-LSTM) and BERT-CNN, verifying the superiority of the improved BERT(wwm)-CNN.

Key words: entity relation extraction, cardiovascular disease, Bidirectional Encoder Representation from Transformers (BERT) network, Convolutional Neural Network (CNN), knowledge graph

摘要: 实体关系抽取是医疗领域知识问答、知识图谱构建及信息抽取的重要基础环节之一。针对在心血管专病知识图谱构建的过程中尚无公开数据集可用的情况,收集了心血管疾病领域的医疗指南并进行相应的实体和关系类别的专业标注,构建了心血管专病知识图谱实体关系抽取的专业数据集。基于该数据集,首先提出双向变形编码器卷积神经网络(BERT-CNN)模型以实现中文语料中的关系抽取,然后根据中文语义中主要以词而不是字为基本单位的特性,提出了改进的基于全词掩模的双向变形编码器卷积神经网络(BERT(wwm)-CNN)模型用于提升在中文语料中关系抽取的性能。实验结果表明,改进的BERT(wwm)-CNN在所构建的关系抽取数据集上准确率达到0.85,召回率达到0.80,F1值达到0.83,优于对比的基于双向变形编码器长短期记忆网络(BERT-LSTM)模型和BERT-CNN模型,验证了改进网络模型的优势。

关键词: 实体关系抽取, 心血管疾病, 双向变形编码器网络, 卷积神经网络, 知识图谱

CLC Number: