Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (9): 2686-2692.DOI: 10.11772/j.issn.1001-9081.2021071317
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Xudong HOU, Fei TENG(), Yi ZHANG
Received:
2021-07-22
Revised:
2021-10-22
Accepted:
2021-10-25
Online:
2022-09-19
Published:
2022-09-10
Contact:
Fei TENG
About author:
HOU Xudong, born in 1996, M. S. candidate. His research interests include medical big data analysis.Supported by:
通讯作者:
滕飞
作者简介:
侯旭东(1996—),男,河南南阳人,硕士研究生,主要研究方向:医疗大数据分析;基金资助:
CLC Number:
Xudong HOU, Fei TENG, Yi ZHANG. Medical named entity recognition model based on deep auto-encoding[J]. Journal of Computer Applications, 2022, 42(9): 2686-2692.
侯旭东, 滕飞, 张艺. 基于深度自编码的医疗命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2686-2692.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071317
数据集 | 实体类别 | 文本总数 | |||||
---|---|---|---|---|---|---|---|
疾病和 诊断 | 手术 | 解剖 部位 | 药物 | 影像 检查 | 实验室 检验 | ||
CCKS-19 | 4 212 | 1 029 | 8 426 | 1 822 | 969 | 1 195 | 1 000 |
CCKS-20 | 4 345 | 923 | 8 811 | 1 935 | 1 002 | 1 297 | 1 050 |
Tab. 1 Entity class and quantity statistics in datasets
数据集 | 实体类别 | 文本总数 | |||||
---|---|---|---|---|---|---|---|
疾病和 诊断 | 手术 | 解剖 部位 | 药物 | 影像 检查 | 实验室 检验 | ||
CCKS-19 | 4 212 | 1 029 | 8 426 | 1 822 | 969 | 1 195 | 1 000 |
CCKS-20 | 4 345 | 923 | 8 811 | 1 935 | 1 002 | 1 297 | 1 050 |
模型 | 数据集 | |
---|---|---|
CCKS-19 | CCKS-20 | |
文献[ | 0.856 2 | |
文献[ | 0.851 6 | |
模型融合+规则[ | 0.915 4 | |
ChiEHRBert+实体融合[ | 0.912 4 | |
Ensemble[ | 0.905 1 | |
CasSAttMNER | 0.9439 | 0.9457 |
Tab. 2 FE evaluation statistics of each model
模型 | 数据集 | |
---|---|---|
CCKS-19 | CCKS-20 | |
文献[ | 0.856 2 | |
文献[ | 0.851 6 | |
模型融合+规则[ | 0.915 4 | |
ChiEHRBert+实体融合[ | 0.912 4 | |
Ensemble[ | 0.905 1 | |
CasSAttMNER | 0.9439 | 0.9457 |
数据集 | 模型 | 实体类别 | |||||
---|---|---|---|---|---|---|---|
疾病和诊断 | 实验室检验 | 手术 | 药物 | 解剖部位 | 影像检查 | ||
CCKS-19 | 文献[ | 0.842 9 | 0.769 4 | 0.833 3 | 0.9602 | 0.861 8 | 0.862 9 |
文献[ | 0.828 1 | 0.756 5 | 0.867 9 | 0.944 9 | 0.859 9 | 0.880 1 | |
CasSAttMNER | 0.9429 | 0.9306 | 0.9091 | 0.912 9 | 0.9549 | 0.9741 | |
CCKS-20 | 模型融合+规则[ | 0.905 3 | 0.835 0 | 0.9621 | 0.937 5 | 0.920 0 | 0.884 7 |
实体融合[ | 0.911 0 | 0.857 1 | 0.955 2 | 0.929 3 | 0.911 6 | 0.886 2 | |
Ensemble[ | 0.899 2 | 0.850 3 | 0.937 5 | 0.931 0 | 0.904 3 | 0.876 9 | |
CasSAttMNER | 0.9262 | 0.9542 | 0.932 2 | 0.9401 | 0.9565 | 0.9600 |
Tab. 3 Entity F value measure statistics of each model
数据集 | 模型 | 实体类别 | |||||
---|---|---|---|---|---|---|---|
疾病和诊断 | 实验室检验 | 手术 | 药物 | 解剖部位 | 影像检查 | ||
CCKS-19 | 文献[ | 0.842 9 | 0.769 4 | 0.833 3 | 0.9602 | 0.861 8 | 0.862 9 |
文献[ | 0.828 1 | 0.756 5 | 0.867 9 | 0.944 9 | 0.859 9 | 0.880 1 | |
CasSAttMNER | 0.9429 | 0.9306 | 0.9091 | 0.912 9 | 0.9549 | 0.9741 | |
CCKS-20 | 模型融合+规则[ | 0.905 3 | 0.835 0 | 0.9621 | 0.937 5 | 0.920 0 | 0.884 7 |
实体融合[ | 0.911 0 | 0.857 1 | 0.955 2 | 0.929 3 | 0.911 6 | 0.886 2 | |
Ensemble[ | 0.899 2 | 0.850 3 | 0.937 5 | 0.931 0 | 0.904 3 | 0.876 9 | |
CasSAttMNER | 0.9262 | 0.9542 | 0.932 2 | 0.9401 | 0.9565 | 0.9600 |
1 | 中华人民共和国国家卫生和计划生育委员会. 电子病历基本数据集标准: [S]. 北京:中国标准出版社, 2014. 10.3969/j.issn.1672-7185.2019.02.002 |
National Health and Family Planning Commission of the People’s Republic of China. Standard for basic data sets of electronic medical record: [S]. Beijing: China Standard Press, 2014. 10.3969/j.issn.1672-7185.2019.02.002 | |
2 | 国家卫生健康委办公厅. 关于印发电子病历应用管理规范(试行)的通知[EB/OL]. (2017-02-23) [2021-05-14].. 10.31901/24566764.2014/05.02.02 |
General Office of the National Health Commission. Notice on printing and distributing the management standards for the application of electronic medical records (for trial implementation)[EB/OL]. (2017-02-23) [2021-05-14].. 10.31901/24566764.2014/05.02.02 | |
3 | 国家卫生健康委办公厅. 关于印发电子病历系统应用水平分级评价管理办法(试行)及评价标准(试行)的通知[EB/OL]. (2018-12-09) [2021-05-14].. 10.37544/0720-5953-2018-09-12 |
General Office of the National Health Commission. Notice on issuing the administrative measures (trial) and evaluation standards (trial) for the application level evaluation of the electronic medical record system[EB/OL]. (2018-12-09) [2021-05-14].. 10.37544/0720-5953-2018-09-12 | |
4 | BODENREIDER O. The Unified Medical Language System (UMLS): integrating biomedical terminology[J]. Nucleic Acids Research, 2004, 32(S1): D267-D270. 10.1093/nar/gkh061 |
5 | PATRICK J, LI M. High accuracy information extraction of medication information from clinical notes: 2009 I2B2 medication extraction challenge[J]. Journal of the American Medical Informatics Association, 2010, 17(5): 524-527. 10.1136/jamia.2010.003939 |
6 | UZUNER Ö, SOUTH B R, SHEN S Y, et al. 2010 I2B2/VA challenge on concepts, assertions, and relations in clinical text[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 552-556. 10.1136/amiajnl-2011-000203 |
7 | SUN W Y, RUMSHISKY A, UZUNER O. Evaluating temporal relations in clinical text: 2012 I2B2 challenge[J]. Journal of the American Medical Informatics Association, 2013, 20(5): 806-813. 10.1136/amiajnl-2013-001628 |
8 | STUBBS A, KOTFILA C, UZUNER Ö. Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 I2B2/UTHealth shared task Track 1[J]. Journal of Biomedical Informatics, 2015, 58(S): S11-S19. 10.1016/j.jbi.2015.06.007 |
9 | 杨锦锋,关毅,何彬,等. 中文电子病历命名实体和实体关系语料库构建[J]. 软件学报, 2016, 27(11):2725-2746. 10.13328/j.cnki.jos.004880 |
YANG J F, GUAN Y, HE B, et al. Corpus construction for named entities and entity relations on Chinese electronic medical records[J]. Journal of Software, 2016, 27(11):2725-2746. 10.13328/j.cnki.jos.004880 | |
10 | CUI Y M, CHENG W X, LIU T, et al. Revisiting pre-trained models for Chinese natural language processing[C]// Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA: Association for Computational Linguistics, 2020:657-668. 10.18653/v1/2020.findings-emnlp.58 |
11 | COLLINS M, SINGER Y. Unsupervised models for named entity classification[C]// Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Stroudsburg, PA: Association for Computational Linguistics, 1999:100-110. |
12 | TROTT P A. International classification of diseases for oncology[J]. Journal of Clinical Pathology, 1977, 30(8): 782-782. 10.1136/jcp.30.8.782-c |
13 | CORNET R, DE KEIZER N. Forty years of SNOMED: a literature review[J]. BMC Medical Informatics and Decision Making, 2008, 8(S1): No.S2. 10.1186/1472-6947-8-s1-s2 |
14 | FRIEDMAN C, ALDERSON P O, AUSTIN J H M, et al. A general natural-language text processor for clinical radiology[J]. Journal of the American Medical Informatics Association, 1994, 1(2): 161-174. 10.1136/jamia.1994.95236146 |
15 | CODEN A, SAVOVA G, SOMINSKY I, et al. Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model[J]. Journal of Biomedical Informatics, 2009, 42(5): 937-949. 10.1016/j.jbi.2008.12.005 |
16 | SAVOVA G K, MASANZ J J, OGREN P V, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications[J]. Journal of the American Medical Informatics Association, 2010, 17(5): 507-513. 10.1136/jamia.2009.001560 |
17 | LI D C, KIPPER-SCHULER K, SAVOVA G. Conditional random fields and support vector machines for disorder named entity recognition in clinical texts[C]// Proceedings of the 2008 Workshop on Current Trends in Biomedical Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2008: 94-95. 10.3115/1572306.1572326 |
18 | CLARK C, ABERDEEN J, COARR M, et al. MITRE system for clinical assertion status classification[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 563-567. 10.1136/amiajnl-2011-000164 |
19 | JONNALAGADDA S, COHEN T, WU S, et al. Enhancing clinical concept extraction with distributional semantics[J]. Journal of Biomedical Informatics, 2012, 45(1): 129-140. 10.1016/j.jbi.2011.10.007 |
20 | WU Y H, JIANG M, LEI J B, et al. Named entity recognition in Chinese clinical text using deep neural network[J]. Studies in Health Technology and Informatics, 2015, 216: 624-628. 10.1136/amiajnl-2013-002381 |
21 | HUANG Z H, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[EB/OL]. (2015-08-09) [2021-05-14].. |
22 | XU K, ZHOU Z F, HAO T Y, et al. A bidirectional LSTM and conditional random fields approach to medical named entity recognition[C]// Proceedings of the 2017 International Conference on Advanced Intelligent Systems and Informatics, AISC 639. Cham: Springer, 2017: 355-365. |
23 | JI B, LIU R, LI S S, et al. A hybrid approach for named entity recognition in Chinese electronic medical record[J]. BMC Medical Informatics and Decision Making, 2019, 19(S2): No.64. 10.1186/s12911-019-0767-2 |
24 | BAEVSKI A, EDUNOV S, LIU Y H, et al. Cloze-driven pretraining of self-attention networks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 5360-5369. 10.18653/v1/d19-1539 |
25 | LIU Y J, MENG F D, ZHANG J C, et al. GCDT: a global context enhanced deep transition architecture for sequence labeling[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Stroudsburg. Stroudsburg, PA: Association for Computational Linguistics, 2019: 2431-2441. 10.18653/v1/p19-1233 |
26 | LI J, YE D H, SHANG S. Adversarial transfer for named entity boundary detection with pointer networks[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2019: 5053-5059. 10.24963/ijcai.2019/702 |
27 | BALDI P, SADOWSKI P. The dropout learning algorithm[J]. Artificial Intelligence, 2014, 210: 78-122. 10.1016/j.artint.2014.02.004 |
28 | SUTTON C, McCALLUM A. An introduction to conditional random fields for relational learning[M]// GETOOR L, TASKAR B. Introduction to Statistical Relational Learning. Cambridge: MIT Press, 2007: 93-127. 10.7551/mitpress/7432.003.0006 |
29 | PARIKH A, TÄCKSTRÖM O, DAS D, et al. A decomposable attention model for natural language inference[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2016: 2249-2255. 10.18653/v1/d16-1244 |
30 | 医渡云. Yidu-S4K:医渡云结构化4K数据集[DS/OL]. (2020-11-09) [2021-05-14].. |
Cloud Yidu. Yidu-S4K: Yidu Cloud structured 4K data set[DS/OL]. (2020-11-09) [2021-05-14].. | |
31 | 2020全国知识图谱与语义计算大会. CCKS评测任务CFP[EB/OL]. [2021-05-14].,2020. 10.1155/2021/8884282 |
2020 China Conference on Knowledge Graph and Semantic Computing. CCKS evaluation task CFP[EB/OL]. [2021-05-14].,2020. 10.1155/2021/8884282 | |
32 | KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30) [2021-05-22].. |
33 | 乔锐,杨笑然,黄文亢.基于BERT与模型融合的医疗命名实体识别[EB/OL].[2021-05-14].. 10.1145/3490322.3490336 |
QIAO R, YANG X R, HUANG W K. Medical named entity recognition based on BERT and model fusion[EB/OL]. [2021-05-14].. 10.1145/3490322.3490336 | |
34 | LI N, LUO L, DING Z Y, et al. DUTIR at the CCKS-2019 task1: improving Chinese clinical named entity recognition using stroke ELMo and transfer learning[EB/OL]. [2021-05-14].. |
35 | 晏阳天,赵新宇,吴贤. 基于BERT与字形字音特征的医疗命名实体识别[EB/OL]. [2021-05-14].. |
YAN Y T, ZHAO X Y, WU X. Medical named entity recognition based on BERT and character pattern and phonetic features[EB/OL]. [2021-05-14].. | |
36 | 杨文明,毕金良,邹佳丽,等. 基于 ChiEHRBert 与多模型融合的医疗命名实体识别[EB/OL]. [2021-05-14].. |
YANG W M, BI J L, ZOU J L, et al. Medical named entity recognition based on ChiENRBert and multi-model fusion[EB/OL]. [2021-05-14].. | |
37 | ZHENG H Y, WEN R, CHEN X, et al. Medical named entity recognition using CRF-MT-Adapt and NER-MRC[EB/OL]. [2021-05-14].. 10.1109/cds52072.2021.00068 |
[1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[2] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[3] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[4] | Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU. Help-seeking information extraction model for flood event in social media data [J]. Journal of Computer Applications, 2024, 44(8): 2437-2445. |
[5] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[6] | Yuqing WANG, Guangli ZHU, Wenjie DUAN, Shuyu LI, Ruotong ZHOU. Sentiment classification model of psychological counseling text based on attention over attention mechanism [J]. Journal of Computer Applications, 2024, 44(8): 2393-2399. |
[7] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[8] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[9] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
[10] | Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232. |
[11] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[12] | Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025. |
[13] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
[14] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[15] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||