《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (12): 4030-4036.DOI: 10.11772/j.issn.1001-9081.2024121745

• 多媒体计算与计算机仿真 • 上一篇    下一篇

特征融合的MV2-Transformer肺炎X光图像分类模型

平金如1, 孙子文1,2   

  1. 1.江南大学 物联网工程学院,江苏 无锡 214122
    2.物联网技术应用教育部工程研究中心(江南大学),江苏 无锡 214122
  • 收稿日期:2024-12-12 修回日期:2025-03-14 接受日期:2025-03-17 发布日期:2025-03-27 出版日期:2025-12-10
  • 通讯作者: 孙子文
  • 作者简介:平金如(2000—),男,江苏南通人,硕士研究生,CCF会员,主要研究方向:医学图像处理、肺炎图像分类
    孙子文(1968—),女,四川大竹人,教授,博士,主要研究方向:医学图像处理、网络攻击检测。
  • 基金资助:
    国家自然科学基金资助项目(62173160)

Pneumonia X-ray image classification model by MV2-Transformer with feature fusion

Jinru PING1, Ziwen SUN1,2   

  1. 1.School of Internet of Things Engineering,Jiangnan University,Wuxi Jiangsu 214122,China
    2.Engineering Research Center of Internet of Things Technology Applications,Ministry of Education (Jiangnan University),Wuxi Jiangsu 214122,China
  • Received:2024-12-12 Revised:2025-03-14 Accepted:2025-03-17 Online:2025-03-27 Published:2025-12-10
  • Contact: Ziwen SUN
  • About author:PING Jinru, born in 2000, M. S. candidate. His research interests include medical image processing, pneumonia image classification.
    SUN Ziwen, born in 1968, Ph. D., professor. Her research interests include medical image processing, network attack detection.
  • Supported by:
    National Natural Science Foundation of China(62173160)

摘要:

针对肺炎X光图像病灶区域特征难以被提取和现有模型轻量化程度不高的问题,提出一种特征融合的MV2-Transformer(FFMV2-Transformer)肺炎X光图像分类模型。首先,采用轻量型网络MobileNetV2(Mobile Network Version 2)作为主干网络,并在反向残差瓶颈块中嵌入坐标注意力(CA)机制,从而通过将位置信息嵌入通道信息提高模型对病灶区域特征的提取能力;其次,设计局部和全局特征融合模块(LGFFM)将卷积层提取的局部特征与Transformer捕获的全局特征相结合,从而使模型能同时捕捉病灶区域的细节信息和整体信息,并进一步提高模型的语义特征提取能力;最后,设计跨层特征融合模块(CFFM)将空间注意力机制增强的浅层特征的空间信息与通道注意力机制增强的深层特征的语义信息相结合,从而获得丰富的上下文信息。为了验证模型的有效性,在肺炎X光数据集上进行消融实验和对比实验,结果表明,FFMV2-Transformer模型与MobileViT(Mobile Vision Transformer)模型相比,准确率、精确率、召回率、F1值和AUC(Area Under ROC(Receiver Operating Characteristic) Curve)值分别提高了1.09、0.31、1.91、1.08和0.40个百分点。可见,FFMV2-Transformer模型能在实现模型轻量化的同时,有效提取肺炎X光图像病灶区域的特征。

关键词: 肺炎X光图像, 注意力机制, MobileNetV2, Transformer, 特征融合

Abstract:

In response to the difficulty of extracting features from lesion areas in pneumonia X-ray images and the limited lightweight degree of the existing models, a Feature Fusion MV2-Transformer (FFMV2-Transformer) pneumonia X-ray image classification model was proposed. Firstly, the lightweight network MobileNetV2 (Mobile Network Version 2) was employed as the backbone network, with the Coordinate Attention (CA) mechanism embedded in the inverted residual bottleneck blocks, so as to enhance the model’s ability to extract features from lesion areas by embedding positional information into channel information. Secondly, a Local and Global Feature Fusion Module (LGFFM) was proposed to combine local features extracted by convolutional layers with global features captured by Transformer, thereby enabling the model to capture detailed and holistic information of lesion areas simultaneously, and further improving the model’s semantic feature extraction capabilities. Finally, a Cross-layer Feature Fusion Module (CFFM) was proposed to combine the spatial information from shallow features enhanced by the spatial attention mechanism with the semantic information from deep features enhanced by the channel attention mechanism, thereby obtaining rich contextual information. To verify the model’s effectiveness, ablation experiments and comparison experiments were conducted on a pneumonia X-ray dataset. The results show that compared to MobileViT (Mobile Vision Transformer) model, FFMV2-Transformer model achieves improvements of 1.09, 0.31, 1.91, 1.08 and 0.40 percentage points in accuracy, precision, recall, F1-score and AUC (Area Under ROC (Receiver Operating Characteristic) Curve), respectively. It can be seen that FFMV2-Transformer model extracts lesion area features from pneumonia X-ray images effectively while realizing model lightweighting.

Key words: pneumonia X-ray image, attention mechanism, MobileNetV2 (Mobile Network Version 2), Transformer, feature fusion

中图分类号: