Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (9): 2686-2692.DOI: 10.11772/j.issn.1001-9081.2021071317
• Artificial intelligence • Previous Articles
Xudong HOU, Fei TENG(), Yi ZHANG
Received:
2021-07-22
Revised:
2021-10-22
Accepted:
2021-10-25
Online:
2022-09-19
Published:
2022-09-10
Contact:
Fei TENG
About author:
HOU Xudong, born in 1996, M. S. candidate. His research interests include medical big data analysis.Supported by:
通讯作者:
滕飞
作者简介:
侯旭东(1996—),男,河南南阳人,硕士研究生,主要研究方向:医疗大数据分析;基金资助:
CLC Number:
Xudong HOU, Fei TENG, Yi ZHANG. Medical named entity recognition model based on deep auto-encoding[J]. Journal of Computer Applications, 2022, 42(9): 2686-2692.
侯旭东, 滕飞, 张艺. 基于深度自编码的医疗命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2686-2692.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071317
数据集 | 实体类别 | 文本总数 | |||||
---|---|---|---|---|---|---|---|
疾病和 诊断 | 手术 | 解剖 部位 | 药物 | 影像 检查 | 实验室 检验 | ||
CCKS-19 | 4 212 | 1 029 | 8 426 | 1 822 | 969 | 1 195 | 1 000 |
CCKS-20 | 4 345 | 923 | 8 811 | 1 935 | 1 002 | 1 297 | 1 050 |
Tab. 1 Entity class and quantity statistics in datasets
数据集 | 实体类别 | 文本总数 | |||||
---|---|---|---|---|---|---|---|
疾病和 诊断 | 手术 | 解剖 部位 | 药物 | 影像 检查 | 实验室 检验 | ||
CCKS-19 | 4 212 | 1 029 | 8 426 | 1 822 | 969 | 1 195 | 1 000 |
CCKS-20 | 4 345 | 923 | 8 811 | 1 935 | 1 002 | 1 297 | 1 050 |
模型 | 数据集 | |
---|---|---|
CCKS-19 | CCKS-20 | |
文献[ | 0.856 2 | |
文献[ | 0.851 6 | |
模型融合+规则[ | 0.915 4 | |
ChiEHRBert+实体融合[ | 0.912 4 | |
Ensemble[ | 0.905 1 | |
CasSAttMNER | 0.9439 | 0.9457 |
Tab. 2 FE evaluation statistics of each model
模型 | 数据集 | |
---|---|---|
CCKS-19 | CCKS-20 | |
文献[ | 0.856 2 | |
文献[ | 0.851 6 | |
模型融合+规则[ | 0.915 4 | |
ChiEHRBert+实体融合[ | 0.912 4 | |
Ensemble[ | 0.905 1 | |
CasSAttMNER | 0.9439 | 0.9457 |
数据集 | 模型 | 实体类别 | |||||
---|---|---|---|---|---|---|---|
疾病和诊断 | 实验室检验 | 手术 | 药物 | 解剖部位 | 影像检查 | ||
CCKS-19 | 文献[ | 0.842 9 | 0.769 4 | 0.833 3 | 0.9602 | 0.861 8 | 0.862 9 |
文献[ | 0.828 1 | 0.756 5 | 0.867 9 | 0.944 9 | 0.859 9 | 0.880 1 | |
CasSAttMNER | 0.9429 | 0.9306 | 0.9091 | 0.912 9 | 0.9549 | 0.9741 | |
CCKS-20 | 模型融合+规则[ | 0.905 3 | 0.835 0 | 0.9621 | 0.937 5 | 0.920 0 | 0.884 7 |
实体融合[ | 0.911 0 | 0.857 1 | 0.955 2 | 0.929 3 | 0.911 6 | 0.886 2 | |
Ensemble[ | 0.899 2 | 0.850 3 | 0.937 5 | 0.931 0 | 0.904 3 | 0.876 9 | |
CasSAttMNER | 0.9262 | 0.9542 | 0.932 2 | 0.9401 | 0.9565 | 0.9600 |
Tab. 3 Entity F value measure statistics of each model
数据集 | 模型 | 实体类别 | |||||
---|---|---|---|---|---|---|---|
疾病和诊断 | 实验室检验 | 手术 | 药物 | 解剖部位 | 影像检查 | ||
CCKS-19 | 文献[ | 0.842 9 | 0.769 4 | 0.833 3 | 0.9602 | 0.861 8 | 0.862 9 |
文献[ | 0.828 1 | 0.756 5 | 0.867 9 | 0.944 9 | 0.859 9 | 0.880 1 | |
CasSAttMNER | 0.9429 | 0.9306 | 0.9091 | 0.912 9 | 0.9549 | 0.9741 | |
CCKS-20 | 模型融合+规则[ | 0.905 3 | 0.835 0 | 0.9621 | 0.937 5 | 0.920 0 | 0.884 7 |
实体融合[ | 0.911 0 | 0.857 1 | 0.955 2 | 0.929 3 | 0.911 6 | 0.886 2 | |
Ensemble[ | 0.899 2 | 0.850 3 | 0.937 5 | 0.931 0 | 0.904 3 | 0.876 9 | |
CasSAttMNER | 0.9262 | 0.9542 | 0.932 2 | 0.9401 | 0.9565 | 0.9600 |
1 | 中华人民共和国国家卫生和计划生育委员会. 电子病历基本数据集标准: [S]. 北京:中国标准出版社, 2014. 10.3969/j.issn.1672-7185.2019.02.002 |
National Health and Family Planning Commission of the People’s Republic of China. Standard for basic data sets of electronic medical record: [S]. Beijing: China Standard Press, 2014. 10.3969/j.issn.1672-7185.2019.02.002 | |
2 | 国家卫生健康委办公厅. 关于印发电子病历应用管理规范(试行)的通知[EB/OL]. (2017-02-23) [2021-05-14].. 10.31901/24566764.2014/05.02.02 |
General Office of the National Health Commission. Notice on printing and distributing the management standards for the application of electronic medical records (for trial implementation)[EB/OL]. (2017-02-23) [2021-05-14].. 10.31901/24566764.2014/05.02.02 | |
3 | 国家卫生健康委办公厅. 关于印发电子病历系统应用水平分级评价管理办法(试行)及评价标准(试行)的通知[EB/OL]. (2018-12-09) [2021-05-14].. 10.37544/0720-5953-2018-09-12 |
General Office of the National Health Commission. Notice on issuing the administrative measures (trial) and evaluation standards (trial) for the application level evaluation of the electronic medical record system[EB/OL]. (2018-12-09) [2021-05-14].. 10.37544/0720-5953-2018-09-12 | |
4 | BODENREIDER O. The Unified Medical Language System (UMLS): integrating biomedical terminology[J]. Nucleic Acids Research, 2004, 32(S1): D267-D270. 10.1093/nar/gkh061 |
5 | PATRICK J, LI M. High accuracy information extraction of medication information from clinical notes: 2009 I2B2 medication extraction challenge[J]. Journal of the American Medical Informatics Association, 2010, 17(5): 524-527. 10.1136/jamia.2010.003939 |
6 | UZUNER Ö, SOUTH B R, SHEN S Y, et al. 2010 I2B2/VA challenge on concepts, assertions, and relations in clinical text[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 552-556. 10.1136/amiajnl-2011-000203 |
7 | SUN W Y, RUMSHISKY A, UZUNER O. Evaluating temporal relations in clinical text: 2012 I2B2 challenge[J]. Journal of the American Medical Informatics Association, 2013, 20(5): 806-813. 10.1136/amiajnl-2013-001628 |
8 | STUBBS A, KOTFILA C, UZUNER Ö. Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 I2B2/UTHealth shared task Track 1[J]. Journal of Biomedical Informatics, 2015, 58(S): S11-S19. 10.1016/j.jbi.2015.06.007 |
9 | 杨锦锋,关毅,何彬,等. 中文电子病历命名实体和实体关系语料库构建[J]. 软件学报, 2016, 27(11):2725-2746. 10.13328/j.cnki.jos.004880 |
YANG J F, GUAN Y, HE B, et al. Corpus construction for named entities and entity relations on Chinese electronic medical records[J]. Journal of Software, 2016, 27(11):2725-2746. 10.13328/j.cnki.jos.004880 | |
10 | CUI Y M, CHENG W X, LIU T, et al. Revisiting pre-trained models for Chinese natural language processing[C]// Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA: Association for Computational Linguistics, 2020:657-668. 10.18653/v1/2020.findings-emnlp.58 |
11 | COLLINS M, SINGER Y. Unsupervised models for named entity classification[C]// Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Stroudsburg, PA: Association for Computational Linguistics, 1999:100-110. |
12 | TROTT P A. International classification of diseases for oncology[J]. Journal of Clinical Pathology, 1977, 30(8): 782-782. 10.1136/jcp.30.8.782-c |
13 | CORNET R, DE KEIZER N. Forty years of SNOMED: a literature review[J]. BMC Medical Informatics and Decision Making, 2008, 8(S1): No.S2. 10.1186/1472-6947-8-s1-s2 |
14 | FRIEDMAN C, ALDERSON P O, AUSTIN J H M, et al. A general natural-language text processor for clinical radiology[J]. Journal of the American Medical Informatics Association, 1994, 1(2): 161-174. 10.1136/jamia.1994.95236146 |
15 | CODEN A, SAVOVA G, SOMINSKY I, et al. Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model[J]. Journal of Biomedical Informatics, 2009, 42(5): 937-949. 10.1016/j.jbi.2008.12.005 |
16 | SAVOVA G K, MASANZ J J, OGREN P V, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications[J]. Journal of the American Medical Informatics Association, 2010, 17(5): 507-513. 10.1136/jamia.2009.001560 |
17 | LI D C, KIPPER-SCHULER K, SAVOVA G. Conditional random fields and support vector machines for disorder named entity recognition in clinical texts[C]// Proceedings of the 2008 Workshop on Current Trends in Biomedical Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2008: 94-95. 10.3115/1572306.1572326 |
18 | CLARK C, ABERDEEN J, COARR M, et al. MITRE system for clinical assertion status classification[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 563-567. 10.1136/amiajnl-2011-000164 |
19 | JONNALAGADDA S, COHEN T, WU S, et al. Enhancing clinical concept extraction with distributional semantics[J]. Journal of Biomedical Informatics, 2012, 45(1): 129-140. 10.1016/j.jbi.2011.10.007 |
20 | WU Y H, JIANG M, LEI J B, et al. Named entity recognition in Chinese clinical text using deep neural network[J]. Studies in Health Technology and Informatics, 2015, 216: 624-628. 10.1136/amiajnl-2013-002381 |
21 | HUANG Z H, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[EB/OL]. (2015-08-09) [2021-05-14].. |
22 | XU K, ZHOU Z F, HAO T Y, et al. A bidirectional LSTM and conditional random fields approach to medical named entity recognition[C]// Proceedings of the 2017 International Conference on Advanced Intelligent Systems and Informatics, AISC 639. Cham: Springer, 2017: 355-365. |
23 | JI B, LIU R, LI S S, et al. A hybrid approach for named entity recognition in Chinese electronic medical record[J]. BMC Medical Informatics and Decision Making, 2019, 19(S2): No.64. 10.1186/s12911-019-0767-2 |
24 | BAEVSKI A, EDUNOV S, LIU Y H, et al. Cloze-driven pretraining of self-attention networks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 5360-5369. 10.18653/v1/d19-1539 |
25 | LIU Y J, MENG F D, ZHANG J C, et al. GCDT: a global context enhanced deep transition architecture for sequence labeling[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Stroudsburg. Stroudsburg, PA: Association for Computational Linguistics, 2019: 2431-2441. 10.18653/v1/p19-1233 |
26 | LI J, YE D H, SHANG S. Adversarial transfer for named entity boundary detection with pointer networks[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2019: 5053-5059. 10.24963/ijcai.2019/702 |
27 | BALDI P, SADOWSKI P. The dropout learning algorithm[J]. Artificial Intelligence, 2014, 210: 78-122. 10.1016/j.artint.2014.02.004 |
28 | SUTTON C, McCALLUM A. An introduction to conditional random fields for relational learning[M]// GETOOR L, TASKAR B. Introduction to Statistical Relational Learning. Cambridge: MIT Press, 2007: 93-127. 10.7551/mitpress/7432.003.0006 |
29 | PARIKH A, TÄCKSTRÖM O, DAS D, et al. A decomposable attention model for natural language inference[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2016: 2249-2255. 10.18653/v1/d16-1244 |
30 | 医渡云. Yidu-S4K:医渡云结构化4K数据集[DS/OL]. (2020-11-09) [2021-05-14].. |
Cloud Yidu. Yidu-S4K: Yidu Cloud structured 4K data set[DS/OL]. (2020-11-09) [2021-05-14].. | |
31 | 2020全国知识图谱与语义计算大会. CCKS评测任务CFP[EB/OL]. [2021-05-14].,2020. 10.1155/2021/8884282 |
2020 China Conference on Knowledge Graph and Semantic Computing. CCKS evaluation task CFP[EB/OL]. [2021-05-14].,2020. 10.1155/2021/8884282 | |
32 | KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30) [2021-05-22].. |
33 | 乔锐,杨笑然,黄文亢.基于BERT与模型融合的医疗命名实体识别[EB/OL].[2021-05-14].. 10.1145/3490322.3490336 |
QIAO R, YANG X R, HUANG W K. Medical named entity recognition based on BERT and model fusion[EB/OL]. [2021-05-14].. 10.1145/3490322.3490336 | |
34 | LI N, LUO L, DING Z Y, et al. DUTIR at the CCKS-2019 task1: improving Chinese clinical named entity recognition using stroke ELMo and transfer learning[EB/OL]. [2021-05-14].. |
35 | 晏阳天,赵新宇,吴贤. 基于BERT与字形字音特征的医疗命名实体识别[EB/OL]. [2021-05-14].. |
YAN Y T, ZHAO X Y, WU X. Medical named entity recognition based on BERT and character pattern and phonetic features[EB/OL]. [2021-05-14].. | |
36 | 杨文明,毕金良,邹佳丽,等. 基于 ChiEHRBert 与多模型融合的医疗命名实体识别[EB/OL]. [2021-05-14].. |
YANG W M, BI J L, ZOU J L, et al. Medical named entity recognition based on ChiENRBert and multi-model fusion[EB/OL]. [2021-05-14].. | |
37 | ZHENG H Y, WEN R, CHEN X, et al. Medical named entity recognition using CRF-MT-Adapt and NER-MRC[EB/OL]. [2021-05-14].. 10.1109/cds52072.2021.00068 |
[1] | Hongjun HENG, Tianbao XU. Attention sentiment analysis model based on multi-scale convolution and gating mechanism [J]. Journal of Computer Applications, 2022, 42(9): 2674-2679. |
[2] | Jie HU, Yan HU, Mengchi LIU, Yan ZHANG. Chinese named entity recognition based on knowledge base entity enhanced BERT model [J]. Journal of Computer Applications, 2022, 42(9): 2680-2685. |
[3] | Kai WEN, Weiwei TANG, Junchen XIONG. Real-time segmentation algorithm based on attention mechanism and effective factorized convolution [J]. Journal of Computer Applications, 2022, 42(9): 2659-2666. |
[4] | Chengxia XU, Qing YAN, Teng LI, Kaichao MIAO. De-raining algorithm based on joint attention mechanism for single image [J]. Journal of Computer Applications, 2022, 42(8): 2578-2585. |
[5] | Kun LI, Qing HOU. Lightweight human pose estimation based on attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2407-2414. |
[6] | Jian ZHANG, Peiyuan CHENG, Siyu SHAO. Rotary machine fault diagnosis based on improved residual convolutional auto-encoding network and class adaptation [J]. Journal of Computer Applications, 2022, 42(8): 2440-2449. |
[7] | Minghui WU, Guangjie ZHANG, Canghong JIN. Time series prediction model based on multimodal information fusion [J]. Journal of Computer Applications, 2022, 42(8): 2326-2332. |
[8] | Zhenhu LYU, Xinzheng XU, Fangyan ZHANG. Lightweight attention mechanism module based on squeeze and excitation [J]. Journal of Computer Applications, 2022, 42(8): 2353-2360. |
[9] | Liying ZHANG, Chunjiang PANG, Xinying WANG, Guoliang LI. Multi-scale object detection algorithm based on improved YOLOv3 [J]. Journal of Computer Applications, 2022, 42(8): 2423-2431. |
[10] | Xinyu ZHANG, Sheng DING, Zhipei YANG. Traffic sign detection algorithm based on improved attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2378-2385. |
[11] | Yinglü XUAN, Yuan WAN, Jiahui CHEN. Time series classification by LSTM based on multi-scale convolution and attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2343-2352. |
[12] | Bo LIU, Linbo QING, Zhengyong WANG, Mei LIU, Xue JIANG. Group activity recognition based on partitioned attention mechanism and interactive position relationship [J]. Journal of Computer Applications, 2022, 42(7): 2052-2057. |
[13] | Haiqi WANG, Zhihai WANG, Liuke LI, Haoran KONG, Qiong WANG, Jianbo XU. Spatial-temporal prediction model of urban short-term traffic flow based on grid division [J]. Journal of Computer Applications, 2022, 42(7): 2274-2280. |
[14] | Xiaohan LI, Jun WANG, Huading JIA, Liu XIAO. Stock market volatility prediction method based on graph neural network with multi-attention mechanism [J]. Journal of Computer Applications, 2022, 42(7): 2265-2273. |
[15] | Wenjun FAN, Shuguang ZHAO, Lizheng GUO. Ship detection algorithm based on improved RetinaNet [J]. Journal of Computer Applications, 2022, 42(7): 2248-2255. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||