《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2721-2726.DOI: 10.11772/j.issn.1001-9081.2022091388

• 2022第10届CCF大数据学术会议 • 上一篇    下一篇

基于元网络的自动国际疾病分类编码模型

周晓敏, 滕飞(), 张艺   

  1. 西南交通大学 计算机与人工智能学院,成都 611756
  • 收稿日期:2022-09-06 修回日期:2022-10-10 接受日期:2022-10-11 发布日期:2022-12-09 出版日期:2023-09-10
  • 通讯作者: 滕飞
  • 作者简介:周晓敏(1997—),女,河北张家口人,硕士研究生,主要研究方向:自然语言处理、国际疾病分类自动编码
    张艺(1998—),女,四川宜宾人,硕士研究生,CCF会员,主要研究方向:自然语言处理、ICD自动编码。
  • 基金资助:
    四川省重点研发项目(2021YFG0136)

Automatic international classification of diseases coding model based on meta-network

Xiaomin ZHOU, Fei TENG(), Yi ZHANG   

  1. School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu Sichuan 611756,China
  • Received:2022-09-06 Revised:2022-10-10 Accepted:2022-10-11 Online:2022-12-09 Published:2023-09-10
  • Contact: Fei TENG
  • About author:ZHOU Xiaomin, born in 1997, M. S. candidate. Her research interests include natural language processing, automatic International Classification of Diseases (ICD) encoding.
    ZHANG Yi, born in 1998, M. S. candidate. Her research interests include natural language processing, automatic ICD encoding.
  • Supported by:
    Key Research and Development Program of Sichuan Province(2021YFG0136)

摘要:

国际疾病分类(ICD)编码的频率分布呈现出长尾的情况,因此,对少样本编码进行多标签文本分类极具挑战性。针对少样本编码分类中训练数据不足的问题,提出了一种基于元网络的自动ICD编码模型(MNIC)。首先,将特征空间中的实例和语义空间中的特征拟合到同一个空间进行映射,并将频繁编码的特征表示映射到它的分类器权重上,从而通过元网络学习到元知识;然后将学习到的元知识从数据丰富的频繁编码转移到数据贫乏的少样本编码;最后,为元知识的可转移性和通用性提供了合理的解释。在MIMIC-Ⅲ数据集上的实验结果表明,与次优的AGM-HT(Adversarial Generative Model conditioned on code descriptions with Hierarchical Tree structure)模型相比,MNIC将少样本编码的Micro-F1与曲线下面积(Micro-AUC)分别提高了3.77和3.82个百分点,显著提高了少样本编码分类的性能。

关键词: 自动国际疾病分类编码, 少样本学习, 元学习, 自然语言处理, 可解释性

Abstract:

The frequency distribution of International Classification of Diseases (ICD) codes is long tail, resulting in it is challenging to perform multi-label text classification for few-shot code. An MNIC (Meta Network-based automatic ICD Coding model) was proposed to solve the problem of insufficient training data in few-shot code classification. Firstly, instances in the feature space and features in the semantic space were fitted to the same space for mapping, and the feature representations of many-shot codes were mapped to their classifier weights, thus learning meta-knowledge through meta-network. Secondly, the learned meta-knowledge was transferred from data-abundant many-shot codes to data-poor few-shot codes. Finally, a reasonable explanation was provided for the transferability and generality of meta-knowledge. Experimental results on MIMIC-Ⅲ dataset show that MNIC improves the Micro-F1 and Micro Area Under Curve (Micro-AUC) of few-shot codes by 3.77 and 3.82 percentage points respectively compared to the suboptimal AGM-HT (Adversarial Generative Model conditioned on code descriptions with Hierarchical Tree structure) model, indicating that the proposed model improves the performance of few-shot code classification significantly.

Key words: automatic International Classification of Diseases (ICD) coding, few-shot learning, meta-learning, Natural Language Processing (NLP), interpretability

中图分类号: