Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (4): 1055-1063.DOI: 10.11772/j.issn.1001-9081.2020060796

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

Overview of information extraction of free-text electronic medical records

CUI Bowen, JIN Tao, WANG Jianmin   

  1. School of Software, Tsinghua University, Beijing 100084, China
  • Received:2020-06-11 Revised:2020-10-13 Online:2021-04-10 Published:2020-12-30
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (71690231).

自由文本电子病历信息抽取综述

崔博文, 金涛, 王建民   

  1. 清华大学 软件学院, 北京 100084
  • 通讯作者: 金涛
  • 作者简介:崔博文(1996—),男,山东烟台人,硕士研究生,主要研究方向:深度学习、医疗大数据;金涛(1980—),男,湖北当阳人,助理研究员,博士,主要研究方向:业务过程管理、工作流、临床路径、大数据、数据安全;王建民(1968—),男,吉林磐石人,教授,博士,主要研究方向:数据管理与信息系统、非结构化数据管理、业务过程与产品生命周期管理、数字版权管理、系统安全、数据库测试。
  • 基金资助:
    国家自然科学基金资助项目(71690231)。

Abstract: Information extraction technology can extract the key information in free-text electronic medical records, helping the information management and subsequent information analysis of the hospital. Therefore, the main process of free-text electronic medical record information extraction was simply introduced, the research results of single extraction and joint extraction methods for three most important types of information:named entity, entity assertion and entity relation in the past few years were studied, and the methods, datasets, and final effects of these results were compared and summarized. In addition, an analysis of the features, advantages and disadvantages of several popular new methods, a summarization of commonly used datasets in the field of information extraction of free-text electronic medical records, and an analysis of the current status and research directions of related fields in China was carried out.

Key words: information extraction, named entity recognition, entity assertion detection, entity relation extraction, electronic medical record

摘要: 电子病历信息抽取技术能够从自由文本电子病历中获取到有用的关键信息,从而为医院的信息管理和后续的信息分析处理工作提供帮助。简要介绍了现阶段自由文本电子病历信息抽取的主要流程,分析了近十几年来关于自由文本电子病历中命名实体、实体修饰与实体间关系三类关键信息的单独抽取以及联合抽取方法的研究成果,对这些成果所采用的主要方法、使用的数据集、最终的实验效果等进行了对比总结。除此之外,还对最新的几种流行方法的特点以及优缺点进行了分析,对目前电子病历信息抽取领域常用数据集进行了总结,分析了目前国内相关领域的现状和发展趋势。

关键词: 信息抽取, 命名实体识别, 实体修饰识别, 实体关系抽取, 电子病历

CLC Number: