Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (8): 2476-2482.DOI: 10.11772/j.issn.1001-9081.2023081166
• Data science and technology • Previous Articles
Quanmei ZHANG1, Runping HUANG1, Fei TENG1, Haibo ZHANG2, Nan ZHOU1()
Received:
2023-09-13
Revised:
2023-10-16
Accepted:
2023-11-03
Online:
2024-08-22
Published:
2024-08-10
Contact:
Nan ZHOU
About author:
ZHANG Quanmei, born in 1998, M.S. candidate. Her research interests include natural language processing, automatic ICD coding.Supported by:
通讯作者:
周南
作者简介:
张全梅(1998—),女,湖南吉首人,硕士研究生,CCF会员,主要研究方向:自然语言处理、自动ICD编码基金资助:
CLC Number:
Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information[J]. Journal of Computer Applications, 2024, 44(8): 2476-2482.
张全梅, 黄润萍, 滕飞, 张海波, 周南. 融合异构信息的自动国际疾病分类编码方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2476-2482.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023081166
数据集 | 类别 | 样本数 | 平均单词数 | 总标签数 | 唯一标签数 | 平均标签数 |
---|---|---|---|---|---|---|
MIMIC⁃Ⅲ⁃Top50 | 训练集 | 8 066 | 1 529.69 | 45 919 | 50 | 918.38 |
验证集 | 1 573 | 1 799.67 | 9 283 | 50 | 185.66 | |
测试集 | 1 729 | 1 825.45 | 10 477 | 50 | 209.54 | |
MIMIC⁃Ⅲ⁃full | 训练集 | 47 723 | 1 484.54 | 758 216 | 8 686 | 87.29 |
验证集 | 1 631 | 1 784.71 | 28 897 | 3 009 | 9.60 | |
测试集 | 3 372 | 1 792.33 | 61 579 | 4 075 | 15.11 |
Tab. 1 MIMIC-Ⅲ dataset division
数据集 | 类别 | 样本数 | 平均单词数 | 总标签数 | 唯一标签数 | 平均标签数 |
---|---|---|---|---|---|---|
MIMIC⁃Ⅲ⁃Top50 | 训练集 | 8 066 | 1 529.69 | 45 919 | 50 | 918.38 |
验证集 | 1 573 | 1 799.67 | 9 283 | 50 | 185.66 | |
测试集 | 1 729 | 1 825.45 | 10 477 | 50 | 209.54 | |
MIMIC⁃Ⅲ⁃full | 训练集 | 47 723 | 1 484.54 | 758 216 | 8 686 | 87.29 |
验证集 | 1 631 | 1 784.71 | 28 897 | 3 009 | 9.60 | |
测试集 | 3 372 | 1 792.33 | 61 579 | 4 075 | 15.11 |
模型 | MIMIC⁃Ⅲ⁃Full数据集 | MIMIC⁃Ⅲ⁃Top50数据集 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Macro⁃AUC | Micro⁃AUC | Macro⁃F1 | Micro⁃F1 | P@8 | Macro⁃AUC | Micro⁃AUC | Macro⁃F1 | Micro⁃F1 | P@5 | |
CAML | 0.820 | 0.966 | 0.048 | 0.442 | 0.523 | 0.875 | 0.909 | 0.532 | 0.614 | 0.609 |
DR⁃CAML | 0.826 | 0.966 | 0.049 | 0.457 | 0.515 | 0.884 | 0.916 | 0.576 | 0.633 | 0.618 |
MultiResCNN | 0.910 | 0.986 | 0.085 | 0.552 | 0.734 | 0.899 | 0.928 | 0.606 | 0.670 | 0.641 |
MDBERT | 0.925 | 0.101 | 0.555 | 0.727 | 0.918 | 0.936 | 0.659 | 0.692 | 0.654 | |
MNIC | 0.907 | 0.982 | 0.576 | — | 0.914 | 0.937 | 0.615 | 0.658 | — | |
MARN | 0.913 | 0.988 | 0.116 | 0.682 | ||||||
AIC⁃HI | 0.989 | 0.094 | 0.627 | 0.762 | 0.931 | 0.951 | 0.729 | 0.678 |
Tab. 2 Experimental results on MIMIC-Ⅲdataset
模型 | MIMIC⁃Ⅲ⁃Full数据集 | MIMIC⁃Ⅲ⁃Top50数据集 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Macro⁃AUC | Micro⁃AUC | Macro⁃F1 | Micro⁃F1 | P@8 | Macro⁃AUC | Micro⁃AUC | Macro⁃F1 | Micro⁃F1 | P@5 | |
CAML | 0.820 | 0.966 | 0.048 | 0.442 | 0.523 | 0.875 | 0.909 | 0.532 | 0.614 | 0.609 |
DR⁃CAML | 0.826 | 0.966 | 0.049 | 0.457 | 0.515 | 0.884 | 0.916 | 0.576 | 0.633 | 0.618 |
MultiResCNN | 0.910 | 0.986 | 0.085 | 0.552 | 0.734 | 0.899 | 0.928 | 0.606 | 0.670 | 0.641 |
MDBERT | 0.925 | 0.101 | 0.555 | 0.727 | 0.918 | 0.936 | 0.659 | 0.692 | 0.654 | |
MNIC | 0.907 | 0.982 | 0.576 | — | 0.914 | 0.937 | 0.615 | 0.658 | — | |
MARN | 0.913 | 0.988 | 0.116 | 0.682 | ||||||
AIC⁃HI | 0.989 | 0.094 | 0.627 | 0.762 | 0.931 | 0.951 | 0.729 | 0.678 |
模型 | MIMIC-Ⅲ-Full数据集 | MIMIC-Ⅲ-Top50数据集 | ||
---|---|---|---|---|
Macro-F1 | Micro-F1 | Macro-F1 | Micro-F1 | |
-fusion | 0.062 | 0.493 | 0.571 | 0.645 |
-attention | 0.081 | 0.619 | 0.630 | 0.718 |
AIC-HI | 0.094 | 0.627 | 0.679 | 0.729 |
Tab. 3 Results of ablation experiments on MIMIC-Ⅲdataset
模型 | MIMIC-Ⅲ-Full数据集 | MIMIC-Ⅲ-Top50数据集 | ||
---|---|---|---|---|
Macro-F1 | Micro-F1 | Macro-F1 | Micro-F1 | |
-fusion | 0.062 | 0.493 | 0.571 | 0.645 |
-attention | 0.081 | 0.619 | 0.630 | 0.718 |
AIC-HI | 0.094 | 0.627 | 0.679 | 0.729 |
1 | COWIE M R, BLOMSTER J I, CURTIS L H. Electronic health records to facilitate clinical research [J]. Clinical Research in Cardiology, 2017,106: 1-9. |
2 | KAUR R, GINIGE J A. Analysing effectiveness of multi-label classification in clinical coding [C]// Proceedings of the 2019 Australasian Computer Science Week Multiconference. New York: ACM, 2019: 24. |
3 | 殷希, 初菁菁, 金雯, 等. 实施疾病诊断相关分组预付费制度对医疗服务质量影响的文献分析[J]. 中华医院管理杂志, 2020, 36(6): 490-495. |
YIN X, CHU J J, JIN W, et al. Literature analysis on the impact of implementing the diagnosis-related groups prospective payment system on the quality of medical services [J]. Chinese Journal of Hospital Administration, 2020, 36(6): 490-495. | |
4 | 李小丹, 梁嘉诚, 邝倩仪, 等. 病案首页ICD编码的准确率对DRGs付费影响 [J]. 广州医药, 2022, 53(6): 96-99. |
LI X D, LIANG J C, KUANG Q Y, et al. The impact of the accuracy of ICD coding on the first page of medical records on the payment of DRGs [J]. Guangzhou Medical Journal, 2022, 53(6): 96-99. | |
5 | 邱小娟,张丽丽,朱琳. 600份住院病案首页ICD-10编码质量分析[J]. 中国病案, 2022, 23(7): 27-29. |
QIU X J, ZHANG L L, ZHU L. Analysis of ICD-10 code quality on the front pages of 600inpatient medical records[J]. Chinese Medical Record, 2022, 23(7): 27-29. | |
6 | YAN C, FU X, LIU X, et al. A survey of automated ICD coding: development, challenges, and applications [J]. Intelligent Medicine, 2022, 2(3): 161-173. |
7 | SHE Y, SONG P. Quality analysis on international classification of disease coding in a 3A hospital [J]. Chinese Medical Record: English Edition, 2013, 1(10): 423-426. |
8 | KRAWCZYK P, ŚWIĘCICKI Ł. ICD-11 vs ICD-10: a review of updates and novelties introduced in the latest version of the WHO International Classification of Diseases [J]. Psychiatria Polska, 2020, 54(1): 7-20. |
9 | LIU S, WANG X, HOU Y, et al. Multimodal data matters: Language model pre-training over structured and unstructured electronic health records [J]. IEEE Journal of Biomedical and Health Informatics, 2023, 27(1): 504-514. |
10 | LI Y, MAMOUEI M, SALIMI-KHORSHIDI G, et al. Hi-BEHRT: hierarchical transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records [J]. IEEE Journal of Biomedical and Health Informatics, 2023, 27(2): 1106-1117. |
11 | PEROTTE A, PIVOVAROV R, NATARAJAN Ket al. Diagnosis code assignment: models and evaluation metrics [J]. Journal of the American Medical Informatics Association, 2014, 21(2): 231-237. |
12 | MULLENBACH J, WIEGREFFE S, DUKE Jet al. Explainable prediction of medical codes from clinical text [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 1101-1111. |
13 | LI F, YU H. ICD coding from clinical text using multi-filter residual convolutional neural network [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8180-8187. |
14 | CATLING F, SPITHOURAKIS G P, RIEDEL S, et al. Towards automated clinical coding [J]. International Journal of Medical Informatics, 2018, 120: 50-61. |
15 | YUAN Z, TAN C, HUANG S. Code synonyms do matter: multiple synonyms matching network for automatic ICD coding [EB/OL]. (2022-03-03) [2023-10-14]. . |
16 | WU Y, CHEN Z, YAO X, et al. JAN: Joint attention networks for automatic ICD coding [J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26(10): 5235-5246. |
17 | ZHANG N, JANKOWSKI M. Hierarchical BERT for medical document understanding [EB/OL]. (2022-03-11) [2023-10-14]. . |
18 | CAO P, CHEN Y, LIU K, et al. Hypercore: hyperbolic and co-graph representation for automatic ICD coding [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020, 3105-3114. |
19 | ZHOU T, CAO P, CHEN Y, et al. Automatic ICD coding via interactive shared representation networks with self-distillation mechanism [C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Conference on Natural Language Processing. Stroudsburg: ACL,2021, 5948-5957. |
20 | VU T, NGUYEN D Q, NGUYEN A. A label attention model for ICD coding from clinical text [EB/OL]. (2020-07-13) [2023-10-14]. . |
21 | YU Y, DUAN J, JIANG H, et al. Automatic ICD coding based on multi-granularity feature fusion [C]// Proceedings of the 18th International Symposium on the Bioinformatics Research and Applications. Cham: Springer, 2023: 19-29. |
22 | 周晓敏,滕飞,张艺. 基于元网络的自动国际疾病分类编码模型 [J]. 计算机应用, 2023, 43(9): 2721-2726. |
ZHOU X M, TENG F, ZHANG Y. Automatic international classification of diseases coding model based on meta-network [J]. Journal of Computer Applications, 2023, 43(9): 2721-2726. | |
23 | BORDES A, USUNIER N, GARCIA-DURÁN D. Translating embeddings for modeling multi-relational data [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2013: 2787-2795. |
24 | WU H, MENG L, XU S, et al. Joint modeling of document and label with clause interaction hypergraph for ICD medical code assignment [C]// Proceedings of the 2022 International Joint Conference on Neural Networks. Piscataway: IEEE, 2022: 1-8. |
25 | SUN W, JI S, CAMBRIA E, et al. Multitask balanced and recalibrated network for medical code prediction [J]. ACM Transactions on Intelligent Systems and Technology, 2022, 14(1): 17. |
[1] | Youren YU, Yangsen ZHANG, Yuru JIANG, Gaijuan HUANG. Chinese named entity recognition model incorporating multi-granularity linguistic knowledge and hierarchical information [J]. Journal of Computer Applications, 2024, 44(6): 1706-1712. |
[2] | Lipeng ZHAO, Bing GUO. Blockchain consensus improvement algorithm based on BDLS [J]. Journal of Computer Applications, 2024, 44(4): 1139-1147. |
[3] | Longtao GAO, Nana LI. Aspect sentiment triplet extraction based on aspect-aware attention enhancement [J]. Journal of Computer Applications, 2024, 44(4): 1049-1057. |
[4] | Xianfeng YANG, Yilei TANG, Ziqiang LI. Aspect-level sentiment analysis model based on alternating‑attention mechanism and graph convolutional network [J]. Journal of Computer Applications, 2024, 44(4): 1058-1064. |
[5] | Tao SUN, Zhangtian DUAN, Haonan ZHU, Peihao GUO, Heli SUN. Social event recommendation method based on unexpectedness metric [J]. Journal of Computer Applications, 2024, 44(3): 760-766. |
[6] | Baoshan YANG, Zhi YANG, Xingyuan CHEN, Bing HAN, Xuehui DU. Analysis of consistency between sensitive behavior and privacy policy of Android applications [J]. Journal of Computer Applications, 2024, 44(3): 788-796. |
[7] | Kaitian WANG, Qing YE, Chunlei CHENG. Classification method for traditional Chinese medicine electronic medical records based on heterogeneous graph representation [J]. Journal of Computer Applications, 2024, 44(2): 411-417. |
[8] | Chenghao FENG, Zhenping XIE, Bowen DING. Selective generation method of test cases for Chinese text error correction software [J]. Journal of Computer Applications, 2024, 44(1): 101-112. |
[9] | Xiaomin ZHOU, Fei TENG, Yi ZHANG. Automatic international classification of diseases coding model based on meta-network [J]. Journal of Computer Applications, 2023, 43(9): 2721-2726. |
[10] | Xinyue ZHANG, Rong LIU, Chiyu WEI, Ke FANG. Aspect-based sentiment analysis method with integrating prompt knowledge [J]. Journal of Computer Applications, 2023, 43(9): 2753-2759. |
[11] | Zexi JIN, Lei LI, Ji LIU. Transfer learning model based on improved domain separation network [J]. Journal of Computer Applications, 2023, 43(8): 2382-2389. |
[12] | Yao LIU, Xin TONG, Yifeng CHEN. Algorithm path self-assembling model for business requirements [J]. Journal of Computer Applications, 2023, 43(6): 1768-1778. |
[13] | Xingbin LIAO, Xiaolin QIN, Siqi ZHANG, Yangge QIAN. Review of interactive machine translation [J]. Journal of Computer Applications, 2023, 43(2): 329-334. |
[14] | Ming XU, Linhao LI, Qiaoling QI, Liqin WANG. Abductive reasoning model based on attention balance list [J]. Journal of Computer Applications, 2023, 43(2): 349-355. |
[15] | Jianle CAO, Nana LI. Semantically enhanced sentiment classification model based on multi-level attention [J]. Journal of Computer Applications, 2023, 43(12): 3703-3710. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||