自由文本电子病历信息抽取综述

doi:10.11772/j.issn.1001-9081.2020060796

摘要/Abstract

摘要： 电子病历信息抽取技术能够从自由文本电子病历中获取到有用的关键信息，从而为医院的信息管理和后续的信息分析处理工作提供帮助。简要介绍了现阶段自由文本电子病历信息抽取的主要流程，分析了近十几年来关于自由文本电子病历中命名实体、实体修饰与实体间关系三类关键信息的单独抽取以及联合抽取方法的研究成果，对这些成果所采用的主要方法、使用的数据集、最终的实验效果等进行了对比总结。除此之外，还对最新的几种流行方法的特点以及优缺点进行了分析，对目前电子病历信息抽取领域常用数据集进行了总结，分析了目前国内相关领域的现状和发展趋势。

关键词: 信息抽取, 命名实体识别, 实体修饰识别, 实体关系抽取, 电子病历

Abstract: Information extraction technology can extract the key information in free-text electronic medical records, helping the information management and subsequent information analysis of the hospital. Therefore, the main process of free-text electronic medical record information extraction was simply introduced, the research results of single extraction and joint extraction methods for three most important types of information:named entity, entity assertion and entity relation in the past few years were studied, and the methods, datasets, and final effects of these results were compared and summarized. In addition, an analysis of the features, advantages and disadvantages of several popular new methods, a summarization of commonly used datasets in the field of information extraction of free-text electronic medical records, and an analysis of the current status and research directions of related fields in China was carried out.

Key words: information extraction, named entity recognition, entity assertion detection, entity relation extraction, electronic medical record

中图分类号:

TP391.1

崔博文, 金涛, 王建民. 自由文本电子病历信息抽取综述[J]. 计算机应用, 2021, 41(4): 1055-1063.

CUI Bowen, JIN Tao, WANG Jianmin. Overview of information extraction of free-text electronic medical records[J]. Journal of Computer Applications, 2021, 41(4): 1055-1063.

参考文献

[1] 陆鹏, 刘金星. 具有自学习能力的电子病历后结构化技术研究[J]. 世界最新医学信息文摘,2018,18(73):192-193.(LU P, LIU J X. Research on post-structuration technology of electronic medical records with self-learning ability[J]. World Latest Medicine Information,2018,18(73):192-193.)
[2] TANABE L,XIE N,THOM L H,et al. GENETAG:a tagged corpus for gene/protein named entity recognition[J]. BMC Bioinformatics,2005,6(S1):No. S3.
[3] KIM J D,OHTA T,TATEISI Y,et al. GENIA corpus-a semantically annotated corpus for bio-text mining[J]. Bioinformatics,2003,19(S1):i180-i182.
[4] UZUNER Ö, SOUTH B R, SHEN S, et al. 2010 i2b2/VA challenge on concepts,assertions,and relations in clinical text[J]. Journal of the American Medical Informatics Association,2011,18(5):552-556.
[5] DOĞAN R I,LEAMAN R,LU Z. NCBI disease corpus:a resource for disease name recognition and concept normalization[J]. Journal of Biomedical Informatics,2014,47:1-10.
[6] FRIEDMAN C,ALDERSON P O,AUSTIN J H,et al. A general natural-language text processor for clinical radiology[J]. Journal of the American Medical Informatics Association, 1994, 1(2):161-174.
[7] CODEN A, SAVOVA G, SOMINSKY I, et al. Automatically extracting cancer disease characteristics from pathology reports into a disease knowledge representation model[J]. Journal of Biomedical Informatics,2009,42(5):937-949.
[8] SAVOVA G K,MASANZ J J,OGREN P V,et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES):architecture,component evaluation and applications[J]. Journal of the American Medical Informatics Association,2010,17(5):507-513.
[9] 张立君. 电子病历数据的结构化分析与研究[D]. 青岛:青岛科技大学,2018:14-15. (ZHANG L J. Structural analysis and research of electronic medical record data[D]. Qingdao:Qingdao University of Science and Technology,2018:14-15.)
[10] ZHOU G,SHEN D,ZHANG J,et al. Recognition of protein/gene names from text using an ensemble of classifiers[J]. BMC Bioinformatics,2005,6(S1):No. S7.
[11] DE BRUIJN B,CHERRY C,KIRITCHENKO S,et al. Machinelearned solutions for three stages of clinical information extraction:the state of the art at i2b22010[J]. Journal of the American Medical Informatics Association,2011,18(5):557-562.
[12] FINKEL J,DINGARE S,MANNING C D,et al. Exploring the boundaries:gene and protein identification in biomedical text[J]. BMC Bioinformatics,2005,6(S1):No. S5.
[13] SAHA S K,SARKAR S,MITRA P. Feature selection techniques for maximum entropy based biomedical named entity recognition[J]. Journal of Biomedical Informatics,2009,42(5):905-911.
[14] LI D,SAVOVA G,KIPPER-SCHULER K. Conditional random fields and support vector machines for disorder named entity recognition in clinical texts[C]//Proceedings of the 2008 Workshop on Current Trends in Biomedical Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics,2008:94-95.
[15] JIANG M,CHEN Y,LIU M,et al. A study of machine-learningbased approaches to extract clinical entities and their assertions from discharge summaries[J]. Journal of the American Medical Informatics Association,2011,18(5):601-606.
[16] JONNALAGADDA S, COHEN T, WU S, et al. Enhancing clinical concept extraction with distributional semantics[J]. Journal of Biomedical Informatics,2012,45(1):129-140.
[17] LEI J,TANG B,LU X,et al. A comprehensive study of named entity recognition in Chinese clinical text[J]. Journal of the American Medical Informatics Association, 2014, 21(5):808-814.
[18] 曲春燕. 中文电子病历命名实体识别研究[D]. 哈尔滨:哈尔滨工业大学,2015:1-48.(QU C Y. Research on named entity recognition for Chinese electronic medical records[D]. Harbin:Harbin Institute of Technology,2015:1-48.)
[19] LING Y,HASAN S A,FARRI O,et al. A domain knowledgeenhanced LSTM-CRF model for disease named entity recognition[J]. AMIA Joint Summits on Translational Science Proceedings, 2019,2019:761-770.
[20] CAI X,DONG S,HU J. A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records[J]. BMC Medical Informatics and Decision Making,2019,19(S2):No. 65.
[21] 李剑风. 融合外部知识的中文命名实体识别研究及其医疗领域应用[D]. 哈尔滨:哈尔滨工业大学,2016:1-55.(LI J F. Research on Chinese named entity recognition with external knowledge and application in medical field[D]. Harbin:Harbin Institute of Technology,2016:1-55.)
[22] LI J,ZHAO S,YANG J,et al. WCP-RNN:a novel RNN-based approach for Bio-NER in Chinese EMRs[J]. The Journal of Supercomputing,2020,76(3):1450-1467.
[23] LYU C,CHEN B,REN Y,et al. Long short-term memory RNN for biomedical named entity recognition[J]. BMC Bioinformatics, 2017,18:No. 462.
[24] UNANUE I J,BORZESHI E Z,PICCARDI M. Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition[J]. Journal of Biomedical Informatics, 2017,76:102-109.
[25] GREENBERG N, BANSAL T, VERGA P, et al. Marginal likelihood training of BiLSTM-CRF for biomedical named entity recognition from disjoint label sets[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2018:2824-2829.
[26] 李慧林, 柴玉梅, 孙穆祯. 面向文本命名实体识别的深层网络模型[J]. 小型微型计算机系统,2019,40(1):50-57.(LI H L, CHAI Y M,SUN M Z. Deep network model for text named entity recognition[J]. Journal of Chinese Computer Systems,2019,40(1):50-57.)
[27] CHOWDHURY S,DONG X,QIAN L,et al. A multitask bidirectional RNN model for named entity recognition on Chinese electronic medical records[J]. BMC Bioinformatics,2018,19(S17):No. 499.
[28] 曹春萍, 关鹏举. 基于E-CNN和BLSTM-CRF的临床文本命名实体识别[J]. 计算机应用研究,2019,36(12):3748-3751. (CAO C P,GUAN P J. Clinical text named entity recognition based on E-CNN and BLSTM-CRF[J]. Application Research of Computers,2019,36(12):3748-3751.)
[29] CASILLAS A,EZEIZA N,GOENAGA I,et al. Measuring the effect of different types of unsupervised word representations on medical named entity recognition[J]. International Journal of Medical Informatics,2019,129:100-106.
[30] DONG X,CHOWDHURY S,QIAN L,et al. Deep learning for named entity recognition on Chinese electronic medical records:combining deep transfer learning with multitask bi-directional LSTM RNN[J]. PLoS ONE,2019,14(5):No. e0216046.
[31] JI B,LIU R,LI S,et al. A hybrid approach for named entity recognition in Chinese electronic medical record[J]. BMC Medical Informatics and Decision Making, 2019, 19(S2):No. 64.
[32] GE S,WU F,WU C,et al. FedNER:medical named entity recognition with federated learning[EB/OL].[2020-03-20]. https://arxiv.org/pdf/2003.09288.pdf.
[33] JI B,LI S,YU J,et al. Research on Chinese medical named entity recognition based on collaborative cooperation of multiple neural network models[J]. Journal of Biomedical Informatics, 2020,104:No. 103395.
[34] YU X, HU W, LU S, et al. BioBERT based named entity recognition in electronic medical record[C]//Proceedings of the 10th International Conference on Information Technology in Medicine and Education. Piscataway:IEEE,2019:49-52.
[35] MAO J,LIU W. Hadoken:a BERT-CRF model for medical document anonymization[C]//Proceedings of the 2019 Iberian Languages Evaluation Forum. Aachen:CEUR-WS. org,2019:720-726.
[36] ZHANG Z, ZHANG Y, ZHOU T, et al. Medical assertion classification in Chinese EMRs using attention enhanced neural network[J]. Mathematical Biosciences and Engineering,2019,16(4):1966-1977.
[37] JOHNSON A E W,POLLARD T J,SHEN L,et al. MIMIC-III,a freely accessible critical care database[J]. Scientific Data,2016, 3:No. 160035.
[38] JOHNSON A E W,POLLARD T J,BERKOWITZ S J,et al. MIMIC-CXR,a de-identified publicly available database of chest radiographs with free-text reports[J]. Scientific Data,2019,6(1):No. 317.
[39] CHAPMAN W W,BRIDEWELL W,HANBURY P,et al. A simple algorithm for identifying negated findings and diseases in discharge summaries[J]. Journal of Biomedical Informatics, 2001,34(5):301-310.
[40] HARKEMA H, DOWLING J N, THORNBLADE T, et al. ConText:an algorithm for determining negation,experiencer,and temporal status from clinical reports[J]. Journal of Biomedical Informatics,2009,42(5):839-851.
[41] MEHRABI S, KRISHNAN A, SOHN S, et al. DEEPEN:a negation detection system for clinical text incorporating dependency relation into NegEx[J]. Journal of Biomedical Informatics,2015,54:213-219.
[42] SOHN S,WU S,CHUTE C G. Dependency parser-based negation detection in clinical narratives[J]. AMIA Summits on Translational Science Proceedings,2012,2012:1-8.
[43] GROUIN C,ABACHA A B,BERNHARD D,et al. CARAMBA:concept, assertion, and relation annotation using machinelearning based approaches[EB/OL].[2020-03-20]. https://perso.limsi.fr/annlor/docs/grouin-et-al_2010_i2b2.pdf.
[44] LIVENTSEV V,FEDULOVA I,DYLOV D. Deep text prior:weakly supervised learning for assertion classification[C]//Proceedings of the 2019 International Conference on Artificial Neural Networks, LNCS 11731. Cham:Springer, 2019:243-257.
[45] 杨锦锋, 于秋滨, 关毅, 等. 电子病历命名实体识别和实体关系抽取研究综述[J]. 自动化学报,2014,40(8):1537-1562. (YANG J F,YU Q B,GUAN Y,et al. An overview of research on electronic medical record oriented named entity recognition and entity relation extraction[J]. Acta Automatica Sinica,2014,40(8):1537-1562.)
[46] 吴嘉伟, 关毅, 吕新波. 基于深度学习的电子病历中实体关系抽取[J]. 智能计算机与应用,2014,4(3):35-38,41.(WU J W, GUAN Y, LYU X B. A deep learning approach in relation extraction in EMRs[J]. Intelligent Computer and Applications, 2014,4(3):35-38,41.)
[47] WANG X, CHUSED A, ELHADAD N, et al. Automated knowledge acquisition from clinical narrative reports[J]. AMIA Annual Symposium Proceedings,2008,2008:783-787.
[48] NIKFARJAM A, EMADZADEH E, GONZALEZ G. Towards generating a patient's timeline:extracting temporal relationships from clinical notes[J]. Journal of Biomedical Informatics,2013, 46(S):S40-S47.
[49] KIM J,CHOE Y,MUELLER K. Extracting clinical relations in electronic health records using enriched parse trees[J]. Procedia Computer Science,2015,53:274-283.
[50] LV X,GUAN Y,YANG J,et al. Clinical relation extraction with deep learning[J]. International Journal of Hybrid Information Technology,2016,9(7):237-248.
[51] SAHU S,ANAND A,ORUGANTY K,et al. Relation extraction from clinical texts using domain invariant convolutional neural network[C]//Proceedings of the 15th Workshop on Biomedical Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2016:206-215.
[52] ZHANG Z,ZHOU T,ZHANG Y,et al. Attention-based deep residual learning network for entity relation extraction in Chinese EMRs[J]. BMC Medical Informatics and Decision Making,2019, 19(S2):No. 55.
[53] DLIGACH D,MILLER T,LIN C,et al. Neural temporal relation extraction[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2017:746-751.
[54] CHRISTOPOULOU F,TRAN T T,SAHU S K,et al. Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods[J]. Journal of the American Medical Informatics Association,2020,27(1):39-46.
[55] SONG L,ZHANG Y,GILDEA D,et al. Leveraging dependency forest for neural medical relation extraction[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing/Proceedings of the 9th International Joint Conference on Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2019:208-218.
[56] XUE K,ZHOU Y,MA Z,et al. Fine-tuning BERT for joint entity and relation extraction in Chinese medical text[C]//Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine. Piscataway:IEEE,2019:892-897.
[57] ZHANG H,YU H,XIONG D,et al. HHMM-based Chinese lexical analyzer ICTCLAS[C]//Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2003:184-187.
[58] 鄂海红, 张文静, 肖思琪, 等. 深度学习实体关系抽取研究综述[J]. 软件学报,2019,30(6):1793-1818.(E H H,ZHANG W J,XIAO S Q,et al. Survey of entity relationship extraction based on deep learning[J]. Journal of Software,2019,30(6):1793-1818.)
[59] ZHENG S,WANG F,BAO H,et al. Joint extraction of entities and relations based on a novel tagging scheme[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2017:1227-1236.
[60] LIU M,ZHANG Y,LI W,et al. Joint model of entity recognition and relation extraction with self-attention mechanism[J]. ACM Transactions on Asian and Low-Resource Language Information Processing,2020,19(4):No. 59.
[61] EBERTS M,ULGES A. Span-based joint entity and relation extraction with transformer pre-training[EB/OL].[2020-03-20]. https://arxiv.org/pdf/1909.07755.pdf.
[62] BHATIA P,CELIKKAYA E B,KHALILIA M. End-to-end joint entity extraction and negation detection for clinical text[C]//Proceedings of the 2019 International Workshop on Health Intelligence,SCI 843. Cham:Springer,2019:139-148.
[63] HAN X,ZHU H,YU P,et al. FewRel:a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics,2018:4803-4809.
[64] ZHANG Y,YANG J. Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2018:1554-1564.