Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (8): 2476-2482.DOI: 10.11772/j.issn.1001-9081.2023081166

• Data science and technology • Previous Articles    

Automatic international classification of disease coding method incorporating heterogeneous information

Quanmei ZHANG1, Runping HUANG1, Fei TENG1, Haibo ZHANG2, Nan ZHOU1()   

  1. 1.School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu Sichuan 611756,China
    2.Department of Computer Science,University of Otago,Otago 9040,New Zealand
  • Received:2023-09-13 Revised:2023-10-16 Accepted:2023-11-03 Online:2024-08-22 Published:2024-08-10
  • Contact: Nan ZHOU
  • About author:ZHANG Quanmei, born in 1998, M.S. candidate. Her research interests include natural language processing, automatic ICD coding.
    HUANG Runping, born in 2000, M. S. candidate. Her research interests include natural language processing, text classification.
    TENG Fei, born in 1984, Ph. D., professor. Her research interests include natural language processing, medical informatics.
    ZHANG Haibo, born in 1982, Ph. D., associate professor. His research interests include medical informatics, high-performance computing.
  • Supported by:
    National Natural Science Foundation of China(62272398);Major Science and Technology Project of Sichuan Province(2023jdr0183)

融合异构信息的自动国际疾病分类编码方法

张全梅1, 黄润萍1, 滕飞1, 张海波2, 周南1()   

  1. 1.西南交通大学 计算机与人工智能学院,成都 611756
    2.奥塔哥大学 计算机科学系,新西兰 奥塔哥 9040
  • 通讯作者: 周南
  • 作者简介:张全梅(1998—),女,湖南吉首人,硕士研究生,CCF会员,主要研究方向:自然语言处理、自动ICD编码
    黄润萍(2000—),女,四川泸州人,硕士研究生,CCF会员,主要研究方向:自然语言处理、文本分类
    滕飞(1984—),女,山东泰安人,教授,博士,CCF高级会员,主要研究方向:自然语言处理、医学信息学
    张海波(1982—),男,山东济南人,副教授,博士,CCF会员,主要研究方向:医学信息学、高性能计算
    周南(1966—),女,浙江杭州人,教授,博士,主要研究方向:教育学、科研管理、产学研创新 nzhou@swjtu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62272398);四川省重大科技专项(2023jdr0183)

Abstract:

Concerning the structural diversity of medical Electronic Health Record (EHR) and the complicated correlation between coding in the automatic International Classification of Disease (ICD) coding task, an Automatic ICD Coding method integrating Heterogeneous Information (AIC-HI) was proposed. Firstly, various feature extractors were designed based on the distinctive characteristics of structured coding, semi-structured description, and unstructured medical text in the coding task. At the same time, the coding knowledge graph was constructed to fit the hierarchical relationship of coding, and the association relationships between different branches were transformed into triples containing head and tail coding. Then representation learning was used to fuse encoding and description information to calculate label features. Finally, the attention mechanism was used to extract the most relevant feature representation in unstructured documents. The experimental results show that, compared with the suboptimal baseline model MARN (Multitask bAlanced and Recalibrated Network), the microscopic F1-score of the model AIC-HI on the real clinical dataset MIMIC-Ⅲ is increased by 4.3 percentage points.

Key words: medical code prediction, automatic International Classification of Disease (ICD) coding, hierarchical structure, heterogeneous information, Natural Language Processing (NLP)

摘要:

针对自动国际疾病分类(ICD)编码中医学电子健康记录(EHR)的结构多样性以及编码间复杂的关联关系等特点,提出一种融合异构信息的自动ICD编码方法AIC-HI(Automatic ICD Coding integrating Heterogeneous Information)。首先,针对编码任务中结构化编码、半结构化描述、非结构化医学文本这3种异构数据的不同特性设计了多种特征提取器;其次,构建编码知识图谱拟合编码的层次结构关系,将不同分支间关联关系转化为包含头尾编码的三元组;再次,运用表征学习融合编码和描述信息计算标签特征;最后,通过注意力机制提取在非结构化文档中与编码标签最为相关的特征表示。实验结果表明,与次优的基线模型MARN(Multitask bAlanced and Recalibrated Network)相比,AIC-HI在真实临床数据集MIMIC-Ⅲ上所有编码的微观F1值提升了4.3个百分点。

关键词: 医学代码预测, 自动国际疾病分类编码, 层次结构, 异构信息, 自然语言处理

CLC Number: