Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (6): 2016-2024.DOI: 10.11772/j.issn.1001-9081.2024060806

• Multimedia computing and computer simulation • Previous Articles    

Wireless capsule endoscopy image classification model based on improved ConvNeXt

Xiang WANG1, Qianqian CUI1, Xiaoming ZHANG1, Jianchao WANG1(), Zhenzhou WANG1, Jialin SONG2   

  1. 1.School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang Hebei 050018,China
    2.School of Electrical Engineering,Hebei University of Technology,Tianjin 300130,China
  • Received:2024-06-20 Revised:2024-08-28 Accepted:2024-09-03 Online:2024-09-10 Published:2025-06-10
  • Contact: Jianchao WANG
  • About author:WANG Xiang, born in 1978, Ph. D., associate professor. Her research interests include intelligent optimization algorithm, machine vision.
    CUI Qianqian, born in 2000, M. S. candidate. Her research interests include image processing, object detection.
    ZHANG Xiaoming, born in 1975, Ph. D., professor. His research interests include knowledge graph, semantic Web.
    WANG Jianchao, born in 1991, Ph. D., lecturer. His research interests include intelligent information processing, machine vision.
    WANG Zhenzhou, born in 1978, Ph. D., professor. His research interests include image processing, pattern recognition.
    SONG Jialin, born in 2002. Her research interests include electrical engineering and automation.
  • Supported by:
    Science and Technology Research Project of Colleges and Universities in Hebei Province(QN2023185)

改进ConvNeXt的无线胶囊内镜图像分类模型

王向1, 崔倩倩1, 张晓明1, 王建超1(), 王震洲1, 宋佳霖2   

  1. 1.河北科技大学 信息科学与工程学院,石家庄 050018
    2.河北工业大学 电气工程学院,天津 300130
  • 通讯作者: 王建超
  • 作者简介:王向(1978—),女,河北石家庄人,副教授,博士,主要研究方向:智能优化算法、机器视觉
    崔倩倩(2000—),女,河南郑州人,硕士研究生,主要研究方向:图像处理、目标检测
    张晓明(1975—),男,河北石家庄人,教授,博士,CCF会员,主要研究方向:知识图谱、语义Web
    王建超(1991—),男,河北石家庄人,讲师,博士,主要研究方向:智能信息处理、机器视觉 wjc107960@163.com
    王震洲(1978—),男,河北石家庄人,教授,博士,主要研究方向:图像处理、模式识别
    宋佳霖(2002—),女,河北保定人,主要研究方向:电气工程及自动化。
  • 基金资助:
    河北省高等学校科学技术研究项目(QN2023185)

Abstract:

Aiming at the problem that Wireless Capsule Endoscopy (WCE) image classification models are only for a single disease or limited to a specific organ, and are difficult to adapt to clinical needs, a WCE image classification model based on improved ConvNeXt-T(ConvNeXt Tiny) was proposed. Firstly, a Simple parameter-free Attention Module (SimAM) was introduced during the model’s feature extraction process to make the model focus on the key areas of WCE images, so as to capture the detailed features such as the boundaries and textures of lesion areas accurately. Secondly, a Global Context Multi-scale Feature Fusion (GC-MFF) module was designed. In the module, global context modeling capability of the model was firstly optimized through Global Context Block (GC Block), and then the shallow and deep multi-scale features were fused to obtain WCE images features with more representation ability. Finally, the Cross Entropy (CE) loss function was optimized to address the problem of large intra-class differences among WCE images. Experimental results on a WCE dataset show that the proposed model has the accuracy and F1 value increased by 2.96 and 3.16 percentage points, respectively, compared with the original model ConvNeXt-T; compared with Swin-B (Swin Transformer Base) model, which has the best performance among mainstream classification models, the proposed model has the number of parameters reduced by 67.4% and the accuracy and F1 value increased by 0.51 and 0.67 percentage points, respectively. The above indicates that the proposed model has better classification performance and can assist doctors in making accurate diagnosis of digestive tract diseases effectively.

Key words: capsule endoscopy, image classification, ConvNeXt, attention mechanism, Multi-scale Feature Fusion (MFF)

摘要:

针对无线胶囊内镜(WCE)图像分类模型存在的仅针对单一疾病或局限于某个特定器官,而难以适应临床需求的问题,提出一种改进ConvNeXt-T(ConvNeXt Tiny)的WCE图像分类模型。首先,在模型特征提取过程中引入简单无参注意力模块(SimAM),使模型关注WCE图像的关键区域,从而精准捕捉病变区域边界和纹理等细节特征;其次,设计全局上下文多尺度特征融合(GC-MFF)模块;先通过全局上下文模块(GC Block)优化模型的全局上下文建模能力,再融合浅层和深层的多尺度特征以获得更具表征能力的WCE图像特征;最后,针对WCE图像类内差异大的问题,优化交叉熵(CE)损失函数。在WCE数据集上的实验结果表明,相较于原始模型ConvNeXt-T,所提模型在准确率和F1值上分别提升了2.96和3.16个百分点;与主流分类模型中性能表现最好的Swin-B (Swin Transformer Base)模型相比,所提模型在参数量上减少了67.4%,在准确率和F1值上分别提升了0.51和0.67个百分点。以上表明所提模型具有更好的分类性能,能有效辅助医生进行准确的消化道疾病诊断。

关键词: 胶囊内镜, 图像分类, ConvNeXt, 注意力机制, 多尺度特征融合

CLC Number: