Wireless capsule endoscopy image classification model based on improved ConvNeXt

doi:10.11772/j.issn.1001-9081.2024060806

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (6): 2016-2024.DOI: 10.11772/j.issn.1001-9081.2024060806

• Multimedia computing and computer simulation • Previous Articles

Wireless capsule endoscopy image classification model based on improved ConvNeXt

Xiang WANG¹, Qianqian CUI¹, Xiaoming ZHANG¹, Jianchao WANG¹(), Zhenzhou WANG¹, Jialin SONG²

^1.School of Information Science and Engineering，Hebei University of Science and Technology，Shijiazhuang Hebei 050018，China
^2.School of Electrical Engineering，Hebei University of Technology，Tianjin 300130，China

Received:2024-06-20 Revised:2024-08-28 Accepted:2024-09-03 Online:2024-09-10 Published:2025-06-10
Contact: Jianchao WANG
About author:WANG Xiang， born in 1978， Ph. D.， associate professor. Her research interests include intelligent optimization algorithm， machine vision.
CUI Qianqian， born in 2000， M. S. candidate. Her research interests include image processing， object detection.
ZHANG Xiaoming， born in 1975， Ph. D.， professor. His research interests include knowledge graph， semantic Web.
WANG Jianchao， born in 1991， Ph. D.， lecturer. His research interests include intelligent information processing， machine vision.
WANG Zhenzhou， born in 1978， Ph. D.， professor. His research interests include image processing， pattern recognition.
SONG Jialin， born in 2002. Her research interests include electrical engineering and automation.
Supported by:
Science and Technology Research Project of Colleges and Universities in Hebei Province(QN2023185)

改进ConvNeXt的无线胶囊内镜图像分类模型

王向¹, 崔倩倩¹, 张晓明¹, 王建超¹(), 王震洲¹, 宋佳霖²

^1.河北科技大学信息科学与工程学院，石家庄 050018
^2.河北工业大学电气工程学院，天津 300130

通讯作者: 王建超
作者简介:王向（1978—），女，河北石家庄人，副教授，博士，主要研究方向：智能优化算法、机器视觉
崔倩倩（2000—），女，河南郑州人，硕士研究生，主要研究方向：图像处理、目标检测
张晓明（1975—），男，河北石家庄人，教授，博士，CCF会员，主要研究方向：知识图谱、语义Web
王建超（1991—），男，河北石家庄人，讲师，博士，主要研究方向：智能信息处理、机器视觉 wjc107960@163.com
王震洲（1978—），男，河北石家庄人，教授，博士，主要研究方向：图像处理、模式识别
宋佳霖（2002—），女，河北保定人，主要研究方向：电气工程及自动化。
基金资助:
河北省高等学校科学技术研究项目(QN2023185)

Abstract

Abstract:

Aiming at the problem that Wireless Capsule Endoscopy （WCE） image classification models are only for a single disease or limited to a specific organ， and are difficult to adapt to clinical needs， a WCE image classification model based on improved ConvNeXt-T（ConvNeXt Tiny） was proposed. Firstly， a Simple parameter-free Attention Module （SimAM） was introduced during the model’s feature extraction process to make the model focus on the key areas of WCE images， so as to capture the detailed features such as the boundaries and textures of lesion areas accurately. Secondly， a Global Context Multi-scale Feature Fusion （GC-MFF） module was designed. In the module， global context modeling capability of the model was firstly optimized through Global Context Block （GC Block）， and then the shallow and deep multi-scale features were fused to obtain WCE images features with more representation ability. Finally， the Cross Entropy （CE） loss function was optimized to address the problem of large intra-class differences among WCE images. Experimental results on a WCE dataset show that the proposed model has the accuracy and F1 value increased by 2.96 and 3.16 percentage points， respectively， compared with the original model ConvNeXt-T； compared with Swin-B （Swin Transformer Base） model， which has the best performance among mainstream classification models， the proposed model has the number of parameters reduced by 67.4% and the accuracy and F1 value increased by 0.51 and 0.67 percentage points， respectively. The above indicates that the proposed model has better classification performance and can assist doctors in making accurate diagnosis of digestive tract diseases effectively.

Key words: capsule endoscopy, image classification, ConvNeXt, attention mechanism, Multi-scale Feature Fusion (MFF)

摘要：

针对无线胶囊内镜（WCE）图像分类模型存在的仅针对单一疾病或局限于某个特定器官，而难以适应临床需求的问题，提出一种改进ConvNeXt-T（ConvNeXt Tiny）的WCE图像分类模型。首先，在模型特征提取过程中引入简单无参注意力模块（SimAM），使模型关注WCE图像的关键区域，从而精准捕捉病变区域边界和纹理等细节特征；其次，设计全局上下文多尺度特征融合（GC-MFF）模块；先通过全局上下文模块（GC Block）优化模型的全局上下文建模能力，再融合浅层和深层的多尺度特征以获得更具表征能力的WCE图像特征；最后，针对WCE图像类内差异大的问题，优化交叉熵（CE）损失函数。在WCE数据集上的实验结果表明，相较于原始模型ConvNeXt-T，所提模型在准确率和F1值上分别提升了2.96和3.16个百分点；与主流分类模型中性能表现最好的Swin-B （Swin Transformer Base）模型相比，所提模型在参数量上减少了67.4%，在准确率和F1值上分别提升了0.51和0.67个百分点。以上表明所提模型具有更好的分类性能，能有效辅助医生进行准确的消化道疾病诊断。

关键词: 胶囊内镜, 图像分类, ConvNeXt, 注意力机制, 多尺度特征融合

CLC Number:

TP391.41

Xiang WANG, Qianqian CUI, Xiaoming ZHANG, Jianchao WANG, Zhenzhou WANG, Jialin SONG. Wireless capsule endoscopy image classification model based on improved ConvNeXt[J]. Journal of Computer Applications, 2025, 45(6): 2016-2024.

王向, 崔倩倩, 张晓明, 王建超, 王震洲, 宋佳霖. 改进ConvNeXt的无线胶囊内镜图像分类模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 2016-2024.

Figures/Tables 11

References 36

1	ZAMMIT S C， SIDHU R. Capsule endoscopy — recent developments and future directions［J］. Expert Review of Gastroenterology and Hepatology， 2021， 15（2）： 127-137.
2	DRAY X， IAKOVIDIS D， HOUDEVILLE C， et al. Artificial intelligence in small bowel capsule endoscopy — current status， challenges and future promise［J］. Journal of Gastroenterology and Hepatology， 2021， 36（1）： 12-19.
3	吴海迪，杨景玉，吴振伦，等. 胶囊内镜中人工智能的应用现状［J］. 临床医学研究与实践， 2024， 9（7）：195-198.
	WU H D， YANG J Y， WU Z L， et al. Application status of artificial intelligence in capsule endoscopy［J］. Clinical Research and Practice， 2024， 9（7）： 195-198.
4	XIAO Z， FENG L N. A study on wireless capsule endoscopy for small intestinal lesions detection based on deep learning target detection［J］. IEEE Access， 2020， 8： 159017-159026.
5	SUMAN S， HUSSIN F A B， MALIK A S， et al. Detection and classification of bleeding region in WCE images using color feature［C］// Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing. New York： ACM， 2017： No.17.
6	LIU G， YAN G， KUANG S， et al. Detection of small bowel tumor based on multi-scale curvelet analysis and fractal technology in capsule endoscopy［J］. Computers in Biology and Medicine， 2016， 70： 131-138.
7	POGORELOV K， SUMAN S， HUSSIN F A， et al. Bleeding detection in wireless capsule endoscopy videos — color versus texture features ［J］. Journal of Applied Clinical Medical Physics， 2019， 20（8）： 141-154.
8	AMIRI Z， HASSANPOUR H， BEGHDADI A. Abnormalities detection in wireless capsule endoscopy images using EM algorithm［J］. The Visual Computer， 2023， 39（7）： 2999-3010.
9	HWANG Y， LEE H H， PARK C， et al. Improved classification and localization approach to small bowel capsule endoscopy using convolutional neural network［J］. Digestive Endoscopy， 2021， 33（4）： 598-607.
10	MURUGANANTHAM P， BALAKRISHNAN S M. Attention aware deep learning model for wireless capsule endoscopy lesion classification and localization［J］. Journal of Medical and Biological Engineering， 2022， 42（2）： 157-168.
11	MARIN-SANTOS D， CONTRERAS-FERNANDEZ J A， PEREZ-BORRERO I， et al. Automatic detection of Crohn disease in wireless capsule endoscopic images using a deep convolutional neural network ［J］. Applied Intelligence， 2023， 53（10）： 12632-12646.
12	杨昆，孙宇锋，汪世伟，等. YOLOF-CBAM：一种新的结直肠息肉实时分类与检测方法［J］. 电子测量技术， 2023， 46（16）：138-147.
	YANG K， SUN Y F， WANG S W， et al. YOLOF-CBAM： a new real-time classification and detection method for colorectal polyps［J］. Electronic Measurement Technology， 2023， 46（16）： 138-147.
13	SOUAIDI M， LAFRAXO S， KERKAOU Z， et al. A multiscale polyp detection approach for GI tract images based on improved DenseNet and single-shot multi-box detector［J］. Diagnostics， 2023， 13（4）： No.733.
14	JAIN S， SEAL A， OJHA A， et al. Detection of abnormality in wireless capsule endoscopy images using fractal features［J］. Computers in Biology and Medicine， 2020， 127： No.104094.
15	安晨，汪成亮，廖超，等. 基于注意力关系网络的无线胶囊内镜图像分类方法［J］. 计算机工程， 2021， 47（10）：252-259， 268.
	AN C， WANG C L， LIAO C， et al. Wireless capsule endoscopy image classification method based on attention relational network［J］. Computer Engineering， 2021， 47（10）： 252-259， 268.
16	XIAO P， PAN Y， CAI F， et al. A deep learning based framework for the classification of multi-class capsule gastroscope image in gastroenterologic diagnosis［J］. Frontiers in Physiology， 2022， 13： No.1060591.
17	MOHAPATRA S， KUMAR PATI G， MISHRA M， et al. Gastrointestinal abnormality detection and classification using empirical wavelet transform and deep convolutional neural network from endoscopic images［J］. Ain Shams Engineering Journal， 2023， 14（4）： No.101942.
18	MUKHTOROV D， RAKHMONOVA M， MUKSIMOVA S， et al. Endoscopic image classification based on explainable deep learning［J］. Sensors， 2023， 23（6）： No.3176.
19	俞敏. 基于消化道胶囊内窥镜影像的器官分类算法研究［D］. 杭州：浙江工业大学， 2020.
	YU M. Study on organ classification of gastrointestinal capsule endoscope images［D］. Hangzhou： Zhejiang University of Technology， 2020.
20	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
21	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. ［2024-03-22］..
22	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
23	LIU Z， MAO H， WU C Y， et al. A ConvNet for the 2020s［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 11966-11976.
24	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical Vision Transformer using shifted windows［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9992-10002.
25	YANG L， ZHANG R Y， LI L， et al. SimAM： a simple， parameter-free attention module for convolutional neural networks［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 11863-11874.
26	CAO Y， XU J， LIN S， et al. GCNet： non-local networks meet squeeze-excitation networks and beyond［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway： IEEE， 2019： 1971-1980.
27	WANG X， GIRSHICK R， GUPTA A， et al. Non-local neural networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7794-7803.
28	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141.
29	WANG Y， MA X， CHEN Z， et al. Symmetric cross entropy for robust learning with noisy labels［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 322-330.
30	WEN Y， ZHANG K， LI Z， et al. A discriminative feature learning approach for deep face recognition［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9911. Cham： Springer， 2016： 499-515.
31	SMEDSRUD P H， THAMBAWITA V， HICKS S A， et al. Kvasir‑Capsule， a video capsule endoscopy dataset［J］. Scientific Data， 2021， 8： No.142.
32	LIU Z， LV Q， LI Y， et al. MedAugment： universal automatic data augmentation plug-in for medical image analysis［EB/OL］. ［2024-03-27］..
33	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
34	WANG Q， WU B， ZHU P， et al. ECA-Net： efficient channel attention for deep convolutional neural networks［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11531-11539.
35	HOU Q， ZHOU D， FENG J. Coordinate attention for efficient mobile network design［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13708-13717.
36	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［EB/OL］. ［2024-03-20］..

类别名	原始数据集实例数			训练集数据增强后的实例数
类别名	训练集	验证集	测试集	训练集数据增强后的实例数
总计	2 344	780	777	5 279
幽门	321	107	106	642
回盲瓣	312	104	104	624
血管扩张	304	101	101	608
红斑	96	32	31	576
淋巴扩张	261	87	87	588
息肉	33	11	11	396
溃疡	273	91	90	602
侵蚀	304	101	101	608
正常	440	146	146	635

类别名	原始数据集实例数			训练集数据增强后的实例数
类别名	训练集	验证集	测试集	训练集数据增强后的实例数
总计	2 344	780	777	5 279
幽门	321	107	106	642
回盲瓣	312	104	104	624
血管扩张	304	101	101	608
红斑	96	32	31	576
淋巴扩张	261	87	87	588
息肉	33	11	11	396
溃疡	273	91	90	602
侵蚀	304	101	101	608
正常	440	146	146	635

实验类型	模块/模型	准确率/%	平均精确率/%	平均召回率/%	平均 F1值/ %	参数量/10⁶
注意力机制对比	CBAM	92.54	92.41	93.82	93.05	28.35
	ECA	93.05	94.11	94.22	94.11	28.25
	CA	93.95	94.35	94.65	94.47	28.40
	SimAM	94.85	94.70	95.46	95.02	28.25
与其他模型对比	VGG16^［16］	89.35	89.84	89.43	89.55	134.30
	ResNet18^［15］	88.93	88.66	91.13	89.69	11.18
	ResNet50	90.09	90.43	91.09	90.71	23.53
	ResNet101	91.63	90.55	93.23	91.70	42.52
	ResNet152^［18］	91.76	90.81	92.67	91.61	58.16
	ConvNeXt-T	91.89	90.86	93.24	91.86	27.83
	ConvNeXt-B	93.69	93.36	93.97	93.62	87.58
	ViT	91.25	90.51	92.03	91.09	88.19
	Swin-T	92.79	91.39	93.75	92.40	30.53
	Swin-B	94.34	93.41	95.49	94.35	86.75
	本文模型	94.85	94.70	95.46	95.02	28.25

实验类型	模块/模型	准确率/%	平均精确率/%	平均召回率/%	平均 F1值/ %	参数量/10⁶
注意力机制对比	CBAM	92.54	92.41	93.82	93.05	28.35
	ECA	93.05	94.11	94.22	94.11	28.25
	CA	93.95	94.35	94.65	94.47	28.40
	SimAM	94.85	94.70	95.46	95.02	28.25
与其他模型对比	VGG16^［16］	89.35	89.84	89.43	89.55	134.30
	ResNet18^［15］	88.93	88.66	91.13	89.69	11.18
	ResNet50	90.09	90.43	91.09	90.71	23.53
	ResNet101	91.63	90.55	93.23	91.70	42.52
	ResNet152^［18］	91.76	90.81	92.67	91.61	58.16
	ConvNeXt-T	91.89	90.86	93.24	91.86	27.83
	ConvNeXt-B	93.69	93.36	93.97	93.62	87.58
	ViT	91.25	90.51	92.03	91.09	88.19
	Swin-T	92.79	91.39	93.75	92.40	30.53
	Swin-B	94.34	93.41	95.49	94.35	86.75
	本文模型	94.85	94.70	95.46	95.02	28.25

模型	改进点			准确率/%	平均精确率/%	平均召回率/%	平均F1值/%	参数量/10⁶
模型	SimAM	GC-MFF	损失函数	准确率/%	平均精确率/%	平均召回率/%	平均F1值/%	参数量/10⁶
模型1				91.89	90.86	93.24	91.86	27.83
模型2	√			92.66	91.91	94.14	92.90	27.83
模型3		√		93.05	92.95	93.37	93.04	28.25
模型4			√	92.28	92.54	93.07	92.80	27.83
模型5	√	√		94.08	93.11	95.22	94.03	28.25
模型6	√		√	93.56	93.64	94.61	94.08	27.83
模型7		√	√	94.16	92.27	95.26	93.59	28.25
模型8	√	√	√	94.85	94.70	95.46	95.02	28.25

Wireless capsule endoscopy image classification model based on improved ConvNeXt

改进ConvNeXt的无线胶囊内镜图像分类模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 36

Related Articles 15

Recommended Articles

Metrics

[1]	Haijie WANG, Guangxin ZHANG, Hai SHI, Shu CHEN. Document-level relation extraction based on entity representation enhancement [J]. Journal of Computer Applications, 2025, 45(6): 1809-1816.
[2]	Sheping ZHAI, Yan HUANG, Qing YANG, Rui YANG. Multi-view entity alignment combining triples and text attributes [J]. Journal of Computer Applications, 2025, 45(6): 1793-1800.
[3]	Weigang LI, Xinyi LI, Yongqiang WANG, Yuntao ZHAO. Point cloud classification and segmentation method based on adaptive dynamic graph convolution and parameter-free attention [J]. Journal of Computer Applications, 2025, 45(6): 1980-1986.
[4]	Dan WANG, Wenhao ZHANG, Lijuan PENG. Channel estimation of reconfigurable intelligent surface assisted communication system based on deep learning [J]. Journal of Computer Applications, 2025, 45(5): 1613-1618.
[5]	Man CHEN, Xiaojun YANG, Huimin YANG. Pedestrian trajectory prediction based on graph convolutional network and endpoint induction [J]. Journal of Computer Applications, 2025, 45(5): 1480-1487.
[6]	Sijie NIU, Yuliang LIU. Auxiliary diagnostic method for retinopathy based on dual-branch structure with knowledge distillation [J]. Journal of Computer Applications, 2025, 45(5): 1410-1414.
[7]	Lu CHEN, Huaiyao WANG, Jingyang LIU, Tao YAN, Bin CHEN. Robotic grasp detection with feature fusion of spatial-Fourier domain information under low-light environments [J]. Journal of Computer Applications, 2025, 45(5): 1686-1693.
[8]	Hui LI, Bingzhi JIA, Chenxi WANG, Ziyu DONG, Jilong LI, Zhaoman ZHONG, Yanyan CHEN. Generative adversarial network underwater image enhancement model based on Swin Transformer [J]. Journal of Computer Applications, 2025, 45(5): 1439-1446.
[9]	Chun XU, Shuangyan JI, Huan MA, Enwei SUN, Mengmeng WANG, Mingyu SU. Consultation recommendation method based on knowledge graph and dialogue structure [J]. Journal of Computer Applications, 2025, 45(4): 1157-1168.
[10]	Shiyue GUO, Jianwu DANG, Yangping WANG, Jiu YONG. 3D hand pose estimation combining attention mechanism and multi-scale feature fusion [J]. Journal of Computer Applications, 2025, 45(4): 1293-1299.
[11]	Jie HU, Qiyang ZHENG, Jun SUN, Yan ZHANG. Multi-label classification model based on multi-label relational graph and local dynamic reconstruction learning [J]. Journal of Computer Applications, 2025, 45(4): 1104-1112.
[12]	Yiqin YAN, Chuan LUO, Tianrui LI, Hongmei CHEN. Cross-domain few-shot classification model based on relation network and Vision Transformer [J]. Journal of Computer Applications, 2025, 45(4): 1095-1103.
[13]	Liqin WANG, Zhilei GENG, Yingshuang LI, Yongfeng DONG, Meng BIAN. Open-world knowledge reasoning model based on path and enhanced triplet text [J]. Journal of Computer Applications, 2025, 45(4): 1177-1183.
[14]	Liwei ZHANG, Quan LIANG, Yutao HU, Qiaole ZHU. Channel shuffle attention mechanism based on group convolution [J]. Journal of Computer Applications, 2025, 45(4): 1069-1076.
[15]	Kunyuan JIANG, Xiaoxia LI, Li WANG, Yaodan CAO, Xiaoqiang ZHANG, Nan DING, Yingyue ZHOU. Boundary-cross supervised semantic segmentation network with decoupled residual self-attention [J]. Journal of Computer Applications, 2025, 45(4): 1120-1129.