计算机应用 ›› 2021, Vol. 41 ›› Issue (4): 1055-1063.DOI: 10.11772/j.issn.1001-9081.2020060796
所属专题: 人工智能
收稿日期:
2020-06-11
修回日期:
2020-10-13
出版日期:
2021-04-10
发布日期:
2020-12-30
通讯作者:
金涛
作者简介:
崔博文(1996—),男,山东烟台人,硕士研究生,主要研究方向:深度学习、医疗大数据;金涛(1980—),男,湖北当阳人,助理研究员,博士,主要研究方向:业务过程管理、工作流、临床路径、大数据、数据安全;王建民(1968—),男,吉林磐石人,教授,博士,主要研究方向:数据管理与信息系统、非结构化数据管理、业务过程与产品生命周期管理、数字版权管理、系统安全、数据库测试。
基金资助:
CUI Bowen, JIN Tao, WANG Jianmin
Received:
2020-06-11
Revised:
2020-10-13
Online:
2021-04-10
Published:
2020-12-30
Supported by:
摘要: 电子病历信息抽取技术能够从自由文本电子病历中获取到有用的关键信息,从而为医院的信息管理和后续的信息分析处理工作提供帮助。简要介绍了现阶段自由文本电子病历信息抽取的主要流程,分析了近十几年来关于自由文本电子病历中命名实体、实体修饰与实体间关系三类关键信息的单独抽取以及联合抽取方法的研究成果,对这些成果所采用的主要方法、使用的数据集、最终的实验效果等进行了对比总结。除此之外,还对最新的几种流行方法的特点以及优缺点进行了分析,对目前电子病历信息抽取领域常用数据集进行了总结,分析了目前国内相关领域的现状和发展趋势。
中图分类号:
崔博文, 金涛, 王建民. 自由文本电子病历信息抽取综述[J]. 计算机应用, 2021, 41(4): 1055-1063.
CUI Bowen, JIN Tao, WANG Jianmin. Overview of information extraction of free-text electronic medical records[J]. Journal of Computer Applications, 2021, 41(4): 1055-1063.
[1] 陆鹏, 刘金星. 具有自学习能力的电子病历后结构化技术研究[J]. 世界最新医学信息文摘,2018,18(73):192-193.(LU P, LIU J X. Research on post-structuration technology of electronic medical records with self-learning ability[J]. World Latest Medicine Information,2018,18(73):192-193.) [2] TANABE L,XIE N,THOM L H,et al. GENETAG:a tagged corpus for gene/protein named entity recognition[J]. BMC Bioinformatics,2005,6(S1):No. S3. [3] KIM J D,OHTA T,TATEISI Y,et al. GENIA corpus-a semantically annotated corpus for bio-text mining[J]. Bioinformatics,2003,19(S1):i180-i182. [4] UZUNER Ö, SOUTH B R, SHEN S, et al. 2010 i2b2/VA challenge on concepts,assertions,and relations in clinical text[J]. Journal of the American Medical Informatics Association,2011,18(5):552-556. [5] DOĞAN R I,LEAMAN R,LU Z. NCBI disease corpus:a resource for disease name recognition and concept normalization[J]. Journal of Biomedical Informatics,2014,47:1-10. [6] FRIEDMAN C,ALDERSON P O,AUSTIN J H,et al. A general natural-language text processor for clinical radiology[J]. Journal of the American Medical Informatics Association, 1994, 1(2):161-174. [7] CODEN A, SAVOVA G, SOMINSKY I, et al. Automatically extracting cancer disease characteristics from pathology reports into a disease knowledge representation model[J]. Journal of Biomedical Informatics,2009,42(5):937-949. [8] SAVOVA G K,MASANZ J J,OGREN P V,et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES):architecture,component evaluation and applications[J]. Journal of the American Medical Informatics Association,2010,17(5):507-513. [9] 张立君. 电子病历数据的结构化分析与研究[D]. 青岛:青岛科技大学,2018:14-15. (ZHANG L J. Structural analysis and research of electronic medical record data[D]. Qingdao:Qingdao University of Science and Technology,2018:14-15.) [10] ZHOU G,SHEN D,ZHANG J,et al. Recognition of protein/gene names from text using an ensemble of classifiers[J]. BMC Bioinformatics,2005,6(S1):No. S7. [11] DE BRUIJN B,CHERRY C,KIRITCHENKO S,et al. Machinelearned solutions for three stages of clinical information extraction:the state of the art at i2b22010[J]. Journal of the American Medical Informatics Association,2011,18(5):557-562. [12] FINKEL J,DINGARE S,MANNING C D,et al. Exploring the boundaries:gene and protein identification in biomedical text[J]. BMC Bioinformatics,2005,6(S1):No. S5. [13] SAHA S K,SARKAR S,MITRA P. Feature selection techniques for maximum entropy based biomedical named entity recognition[J]. Journal of Biomedical Informatics,2009,42(5):905-911. [14] LI D,SAVOVA G,KIPPER-SCHULER K. Conditional random fields and support vector machines for disorder named entity recognition in clinical texts[C]//Proceedings of the 2008 Workshop on Current Trends in Biomedical Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics,2008:94-95. [15] JIANG M,CHEN Y,LIU M,et al. A study of machine-learningbased approaches to extract clinical entities and their assertions from discharge summaries[J]. Journal of the American Medical Informatics Association,2011,18(5):601-606. [16] JONNALAGADDA S, COHEN T, WU S, et al. Enhancing clinical concept extraction with distributional semantics[J]. Journal of Biomedical Informatics,2012,45(1):129-140. [17] LEI J,TANG B,LU X,et al. A comprehensive study of named entity recognition in Chinese clinical text[J]. Journal of the American Medical Informatics Association, 2014, 21(5):808-814. [18] 曲春燕. 中文电子病历命名实体识别研究[D]. 哈尔滨:哈尔滨工业大学,2015:1-48.(QU C Y. Research on named entity recognition for Chinese electronic medical records[D]. Harbin:Harbin Institute of Technology,2015:1-48.) [19] LING Y,HASAN S A,FARRI O,et al. A domain knowledgeenhanced LSTM-CRF model for disease named entity recognition[J]. AMIA Joint Summits on Translational Science Proceedings, 2019,2019:761-770. [20] CAI X,DONG S,HU J. A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records[J]. BMC Medical Informatics and Decision Making,2019,19(S2):No. 65. [21] 李剑风. 融合外部知识的中文命名实体识别研究及其医疗领域应用[D]. 哈尔滨:哈尔滨工业大学,2016:1-55.(LI J F. Research on Chinese named entity recognition with external knowledge and application in medical field[D]. Harbin:Harbin Institute of Technology,2016:1-55.) [22] LI J,ZHAO S,YANG J,et al. WCP-RNN:a novel RNN-based approach for Bio-NER in Chinese EMRs[J]. The Journal of Supercomputing,2020,76(3):1450-1467. [23] LYU C,CHEN B,REN Y,et al. Long short-term memory RNN for biomedical named entity recognition[J]. BMC Bioinformatics, 2017,18:No. 462. [24] UNANUE I J,BORZESHI E Z,PICCARDI M. Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition[J]. Journal of Biomedical Informatics, 2017,76:102-109. [25] GREENBERG N, BANSAL T, VERGA P, et al. Marginal likelihood training of BiLSTM-CRF for biomedical named entity recognition from disjoint label sets[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2018:2824-2829. [26] 李慧林, 柴玉梅, 孙穆祯. 面向文本命名实体识别的深层网络模型[J]. 小型微型计算机系统,2019,40(1):50-57.(LI H L, CHAI Y M,SUN M Z. Deep network model for text named entity recognition[J]. Journal of Chinese Computer Systems,2019,40(1):50-57.) [27] CHOWDHURY S,DONG X,QIAN L,et al. A multitask bidirectional RNN model for named entity recognition on Chinese electronic medical records[J]. BMC Bioinformatics,2018,19(S17):No. 499. [28] 曹春萍, 关鹏举. 基于E-CNN和BLSTM-CRF的临床文本命名实体识别[J]. 计算机应用研究,2019,36(12):3748-3751. (CAO C P,GUAN P J. Clinical text named entity recognition based on E-CNN and BLSTM-CRF[J]. Application Research of Computers,2019,36(12):3748-3751.) [29] CASILLAS A,EZEIZA N,GOENAGA I,et al. Measuring the effect of different types of unsupervised word representations on medical named entity recognition[J]. International Journal of Medical Informatics,2019,129:100-106. [30] DONG X,CHOWDHURY S,QIAN L,et al. Deep learning for named entity recognition on Chinese electronic medical records:combining deep transfer learning with multitask bi-directional LSTM RNN[J]. PLoS ONE,2019,14(5):No. e0216046. [31] JI B,LIU R,LI S,et al. A hybrid approach for named entity recognition in Chinese electronic medical record[J]. BMC Medical Informatics and Decision Making, 2019, 19(S2):No. 64. [32] GE S,WU F,WU C,et al. FedNER:medical named entity recognition with federated learning[EB/OL].[2020-03-20]. https://arxiv.org/pdf/2003.09288.pdf. [33] JI B,LI S,YU J,et al. Research on Chinese medical named entity recognition based on collaborative cooperation of multiple neural network models[J]. Journal of Biomedical Informatics, 2020,104:No. 103395. [34] YU X, HU W, LU S, et al. BioBERT based named entity recognition in electronic medical record[C]//Proceedings of the 10th International Conference on Information Technology in Medicine and Education. Piscataway:IEEE,2019:49-52. [35] MAO J,LIU W. Hadoken:a BERT-CRF model for medical document anonymization[C]//Proceedings of the 2019 Iberian Languages Evaluation Forum. Aachen:CEUR-WS. org,2019:720-726. [36] ZHANG Z, ZHANG Y, ZHOU T, et al. Medical assertion classification in Chinese EMRs using attention enhanced neural network[J]. Mathematical Biosciences and Engineering,2019,16(4):1966-1977. [37] JOHNSON A E W,POLLARD T J,SHEN L,et al. MIMIC-III,a freely accessible critical care database[J]. Scientific Data,2016, 3:No. 160035. [38] JOHNSON A E W,POLLARD T J,BERKOWITZ S J,et al. MIMIC-CXR,a de-identified publicly available database of chest radiographs with free-text reports[J]. Scientific Data,2019,6(1):No. 317. [39] CHAPMAN W W,BRIDEWELL W,HANBURY P,et al. A simple algorithm for identifying negated findings and diseases in discharge summaries[J]. Journal of Biomedical Informatics, 2001,34(5):301-310. [40] HARKEMA H, DOWLING J N, THORNBLADE T, et al. ConText:an algorithm for determining negation,experiencer,and temporal status from clinical reports[J]. Journal of Biomedical Informatics,2009,42(5):839-851. [41] MEHRABI S, KRISHNAN A, SOHN S, et al. DEEPEN:a negation detection system for clinical text incorporating dependency relation into NegEx[J]. Journal of Biomedical Informatics,2015,54:213-219. [42] SOHN S,WU S,CHUTE C G. Dependency parser-based negation detection in clinical narratives[J]. AMIA Summits on Translational Science Proceedings,2012,2012:1-8. [43] GROUIN C,ABACHA A B,BERNHARD D,et al. CARAMBA:concept, assertion, and relation annotation using machinelearning based approaches[EB/OL].[2020-03-20]. https://perso.limsi.fr/annlor/docs/grouin-et-al_2010_i2b2.pdf. [44] LIVENTSEV V,FEDULOVA I,DYLOV D. Deep text prior:weakly supervised learning for assertion classification[C]//Proceedings of the 2019 International Conference on Artificial Neural Networks, LNCS 11731. Cham:Springer, 2019:243-257. [45] 杨锦锋, 于秋滨, 关毅, 等. 电子病历命名实体识别和实体关系抽取研究综述[J]. 自动化学报,2014,40(8):1537-1562. (YANG J F,YU Q B,GUAN Y,et al. An overview of research on electronic medical record oriented named entity recognition and entity relation extraction[J]. Acta Automatica Sinica,2014,40(8):1537-1562.) [46] 吴嘉伟, 关毅, 吕新波. 基于深度学习的电子病历中实体关系抽取[J]. 智能计算机与应用,2014,4(3):35-38,41.(WU J W, GUAN Y, LYU X B. A deep learning approach in relation extraction in EMRs[J]. Intelligent Computer and Applications, 2014,4(3):35-38,41.) [47] WANG X, CHUSED A, ELHADAD N, et al. Automated knowledge acquisition from clinical narrative reports[J]. AMIA Annual Symposium Proceedings,2008,2008:783-787. [48] NIKFARJAM A, EMADZADEH E, GONZALEZ G. Towards generating a patient's timeline:extracting temporal relationships from clinical notes[J]. Journal of Biomedical Informatics,2013, 46(S):S40-S47. [49] KIM J,CHOE Y,MUELLER K. Extracting clinical relations in electronic health records using enriched parse trees[J]. Procedia Computer Science,2015,53:274-283. [50] LV X,GUAN Y,YANG J,et al. Clinical relation extraction with deep learning[J]. International Journal of Hybrid Information Technology,2016,9(7):237-248. [51] SAHU S,ANAND A,ORUGANTY K,et al. Relation extraction from clinical texts using domain invariant convolutional neural network[C]//Proceedings of the 15th Workshop on Biomedical Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2016:206-215. [52] ZHANG Z,ZHOU T,ZHANG Y,et al. Attention-based deep residual learning network for entity relation extraction in Chinese EMRs[J]. BMC Medical Informatics and Decision Making,2019, 19(S2):No. 55. [53] DLIGACH D,MILLER T,LIN C,et al. Neural temporal relation extraction[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2017:746-751. [54] CHRISTOPOULOU F,TRAN T T,SAHU S K,et al. Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods[J]. Journal of the American Medical Informatics Association,2020,27(1):39-46. [55] SONG L,ZHANG Y,GILDEA D,et al. Leveraging dependency forest for neural medical relation extraction[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing/Proceedings of the 9th International Joint Conference on Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2019:208-218. [56] XUE K,ZHOU Y,MA Z,et al. Fine-tuning BERT for joint entity and relation extraction in Chinese medical text[C]//Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine. Piscataway:IEEE,2019:892-897. [57] ZHANG H,YU H,XIONG D,et al. HHMM-based Chinese lexical analyzer ICTCLAS[C]//Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2003:184-187. [58] 鄂海红, 张文静, 肖思琪, 等. 深度学习实体关系抽取研究综述[J]. 软件学报,2019,30(6):1793-1818.(E H H,ZHANG W J,XIAO S Q,et al. Survey of entity relationship extraction based on deep learning[J]. Journal of Software,2019,30(6):1793-1818.) [59] ZHENG S,WANG F,BAO H,et al. Joint extraction of entities and relations based on a novel tagging scheme[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2017:1227-1236. [60] LIU M,ZHANG Y,LI W,et al. Joint model of entity recognition and relation extraction with self-attention mechanism[J]. ACM Transactions on Asian and Low-Resource Language Information Processing,2020,19(4):No. 59. [61] EBERTS M,ULGES A. Span-based joint entity and relation extraction with transformer pre-training[EB/OL].[2020-03-20]. https://arxiv.org/pdf/1909.07755.pdf. [62] BHATIA P,CELIKKAYA E B,KHALILIA M. End-to-end joint entity extraction and negation detection for clinical text[C]//Proceedings of the 2019 International Workshop on Health Intelligence,SCI 843. Cham:Springer,2019:139-148. [63] HAN X,ZHU H,YU P,et al. FewRel:a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics,2018:4803-4809. [64] ZHANG Y,YANG J. Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2018:1554-1564. |
[1] | 刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522. |
[2] | 武国亮, 徐继宁. 基于命名实体识别任务反馈增强的中文突发事件抽取方法[J]. 计算机应用, 2021, 41(7): 1891-1896. |
[3] | 许力, 李建华. 基于句法依存分析的图网络生物医学命名实体识别[J]. 计算机应用, 2021, 41(2): 357-362. |
[4] | 武小平, 张强, 赵芳, 焦琳. 基于BERT的心血管医疗指南实体关系抽取方法[J]. 计算机应用, 2021, 41(1): 145-149. |
[5] | 张心怡, 冯仕民, 丁恩杰. 面向煤矿的实体识别与关系抽取模型[J]. 计算机应用, 2020, 40(8): 2182-2188. |
[6] | 王月, 王孟轩, 张胜, 杜渂. 基于BERT的警情文本命名实体识别[J]. 计算机应用, 2020, 40(2): 535-540. |
[7] | 陈佳沣, 滕冲. 基于强化学习的实体关系联合抽取模型[J]. 计算机应用, 2019, 39(7): 1918-1924. |
[8] | 严红, 陈兴蜀, 王文贤, 王海舟, 殷明勇. 基于深度神经网络的法语命名实体识别模型[J]. 计算机应用, 2019, 39(5): 1288-1292. |
[9] | 罗明, 黄海量. 基于词汇语义模式的金融事件信息抽取方法[J]. 计算机应用, 2018, 38(1): 84-90. |
[10] | 张志华, 王建祥, 田俊峰, 吴国顺, 兰曼. 基于多元特征的分块人物关系识别系统[J]. 计算机应用, 2016, 36(3): 751-757. |
[11] | 马建红, 张明月, 赵亚男. 面向创新设计的专利知识抽取方法[J]. 计算机应用, 2016, 36(2): 465-471. |
[12] | 李汝君, 张俊, 张晓民, 桂小庆. 健康领域Web信息抽取[J]. 计算机应用, 2016, 36(1): 163-170. |
[13] | 周详, 李少波, 杨观赐. 服装类商品属性实体识别[J]. 计算机应用, 2015, 35(7): 1945-1949. |
[14] | 赵佳鹏, 林民. 基于维基百科的领域历史沿革信息抽取[J]. 计算机应用, 2015, 35(4): 1021-1025. |
[15] | 李萍 朱建波 周立新 廖彬. 基于快速构建模板的购物信息抽取方法[J]. 计算机应用, 2014, 34(3): 733-737. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||