面向复杂图像分类的共享转换矩阵胶囊网络

doi:10.11772/j.issn.1001-9081.2022101596

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (11): 3411-3417.DOI: 10.11772/j.issn.1001-9081.2022101596

• 人工智能 • 上一篇

面向复杂图像分类的共享转换矩阵胶囊网络

文凯, 薛晓(), 季娟

重庆邮电大学通信与信息工程学院，重庆 401520

收稿日期:2022-10-26 修回日期:2023-04-03 接受日期:2023-04-06 发布日期:2023-05-24 出版日期:2023-11-10
通讯作者: 薛晓
作者简介:文凯（1972—），男，重庆人，高级工程师，博士，主要研究方向：移动通信、计算机视觉
薛晓（1996—），男，山西运城人，硕士研究生，主要研究方向：图像分类、目标检测1464090345@qq.com
季娟（1998—），女，四川广安人，硕士研究生，主要研究方向：图像去噪、图像分割。

Shared transformation matrix capsule network for complex image classification

Kai WEN, Xiao XUE(), Juan JI

College of Communication and Information Engineering，Chongqing University of Posts and Telecommunications，Chongqing 401520，China

Received:2022-10-26 Revised:2023-04-03 Accepted:2023-04-06 Online:2023-05-24 Published:2023-11-10
Contact: Xiao XUE
About author:WEN Kai， born in 1972， Ph. D.， senior engineer. His research interests include mobile communication， computer vision.
XUE Xiao， born in 1996， M. S. candidate. His research interests include image classification， target detection.
JI Juan， born in 1998， M. S. candidate. Her research interests include image denoising， image segmentation.

摘要/Abstract

摘要：

针对胶囊网络（CapsNet）在处理含有背景噪声信息的复杂图像时分类效果不佳且计算开销大的问题，提出一种基于注意力机制和权值共享的改进胶囊网络模型——共享转换矩阵胶囊网络（STM-CapsNet）。该模型主要包括以下改进：1）在特征提取层中引入注意力模块，使低层胶囊能够聚焦于与分类任务相关的实体特征；2）将空间位置接近的低层胶囊分为若干组，每组内的低层胶囊通过共享转换矩阵映射到高层胶囊，降低计算开销，提高模型鲁棒性；3）在间隔损失与重构损失的基础上加入L₂正则化项，防止模型过拟合。在CIFAR10、SVHN（Street View House Number）、FashionMNIST复杂图像数据集上的实验结果表明，各改进均能有效提升模型性能；当迭代次数为3，共享转换矩阵数为5时，STM-CapsNet模型的平均准确率分别为85.26%、93.17%、94.96%，平均参数量为8.29 MB，比基线模型的综合性能更优。

关键词: 胶囊网络, 图像分类, 注意力机制, 共享转换矩阵, 深度学习

Abstract:

Concerning the problems of poor classification performance and high computational overhead of Capsule Network （CapsNet） on complex images with background noise information， an improved capsule network model based on attention mechanism and weight sharing was proposed， called Shared Transformation Matrix CapsNet （STM-CapsNet）. The proposed model mainly includes the following improvement. 1） An attention module was introduced into the feature extraction layer of CapsNet， which enabled low-level capsules to focus on entity features related to the classification task. 2） Low-level capsules with close spatial positions were divided into several groups， and each group of low-level capsules was mapped to high-level capsules by sharing transformation matrices， which reduced computational overhead and improved model robustness. 3） The L₂ regularization term was added to margin loss and reconstruction loss to prevent model overfitting. Experimental results on three complex image datasets including CIFAR10， SVHN （Street View House Number） and FashionMNIST show that， the above improvements are effective in enhacing the model performance； when the number of iterations is 3， and the number of shared transformation matrices is 5， the average accuracies of STM-CapsNet are 85.26%， 93.17% and 94.96% respectively， the average parameter amount is 8.29 MB， verifying that STM-CapsNet has better performance compared with the baseline models.

Key words: Capsule Network (CapsNet), image classification, attention mechanism, shared transformation matrix, deep learning

中图分类号:

TP391.41

文凯, 薛晓, 季娟. 面向复杂图像分类的共享转换矩阵胶囊网络[J]. 计算机应用, 2023, 43(11): 3411-3417.

Kai WEN, Xiao XUE, Juan JI. Shared transformation matrix capsule network for complex image classification[J]. Journal of Computer Applications, 2023, 43(11): 3411-3417.

图/表 12

参考文献 26

1	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［J］. Communications of the ACM， 2017， 60（6）： 84-90. 10.1145/3065386
2	FAN Y， LI Y， WANG S， et al. Application of YOLOv5 neural network based on improved attention mechanism in recognition of Thangka image defects［J］. KSII Transactions on Internet and Information Systems， 2022， 16（1）： 245-265. 10.3837/tiis.2022.01.014
3	CAI J， LI J， LI W， et al. Deep learning model used in text classification［C］// Proceedings of the 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing. Piscataway： IEEE， 2018： 123-126. 10.1109/iccwamtip.2018.8632592
4	RATNER A J， EHRENBERG H R， HUSSAIN Z， et al. Learning to compose domain-specific transformations for data augmentation［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 3239-3249.
5	SABOUR S， FROSST N， HINTON G E. Dynamic routing between capsules［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 3859-3869.
6	AFSHAR P， MOHAMMADI A， PLATANIOTIS K N. Brain tumor type classification via capsule networks［C］// Proceedings of the 2018 25th IEEE International Conference on Image Processing. Piscataway： IEEE， 2018： 3124-3128. 10.1109/icip.2018.8451379
7	MUKHOMETZIANOV R， CARRILLO， J. CapsNet comparative performance evaluation for image classification ［EB/OL］. ［2022-10-15］. .
8	WANG K， HE R， WANG S， et al. The efficient-CapsNet model for facial expression recognition［J］. Applied Intelligence， 2023， 53： 16367-16380. 10.1007/s10489-022-04349-8
9	HUANG W， ZHOU F. DA-CapsNet： dual attention mechanism capsule network［J］. Scientific Reports， 2020，10（1）： Article No. 11383. 10.1038/s41598-020-68453-w
10	JIA X， LI J， ZHAO B， et al. Res-CapsNet： residual capsule network for data classification［J］. Neural Processing Letters， 2022， 54： 4229-4245. 10.1007/s11063-022-10806-9
11	CHENG X， HE J， HEA J， et al. Cv-CapsNet： complex-valued capsule network［J］. IEEE Transactions on Neural Networks and Learning Systems， 2021， 32（2）： 829-839.
12	MOBINY A， VAN NGUYEN H. Fast CapsNet for lung cancer screening［C］// Proceedings of the 2018 21st International Conference on Medical Image Computing and Computer Assisted Intervention， LNIP 11071. Cham： Springer， 2018： 706-714.
13	RAJASEGARAN J， JAYASUNDARA V， JAYASEKARA S， et al. DeepCaps： going deeper with capsule networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2019： 10728-10737. 10.1109/cvpr.2019.01098
14	SHIRI P， SHARIFI R， BANIASADI A. Quick-CapsNet （QCN）： a fast alternative to capsule networks［C］// Proceedings of the 2020 IEEE/ACS 17th International Conference on Computer Systems and Applications. Piscataway： IEEE， 2020： 1-8. 10.1109/aiccsa50499.2020.9316525
15	LI X， WANG L. β-CapsNet： learning disentangled representation for CapsNet by information bottleneck［J］. Neural Computing and Applications， 2022， 33（1）： 1-13.
16	尹春勇，何苗.基于改进胶囊网络的文本分类［J］.计算机应用，2020，40（9）：2525-2530. 10.11772/j.issn.1001-9081.2019122153
	YIN C Y， HE M. Text classification based on improved capsule network［J］. Journal of Computer Applications， 2020， 40（9）： 2525-2530. 10.11772/j.issn.1001-9081.2019122153
17	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
18	任晓丽，李晓青，闫雨寒，等.注意力机制及其在医学视觉任务中的作用研究［J］.影像技术，2023，35（1）：76-80. 10.3969/j.issn.1001-0270.2023.01.14
	REN X L， LI X Q， YAN Y H， et al. Study on attention mechanism and its role in medical visual task［J］. Image Technology. 2023， 35（1）： 76-80. 10.3969/j.issn.1001-0270.2023.01.14
19	KRIZHEVSKY A， HINTON G E. Learning multiple layers of features from tiny images［J］. Handbook of Systemic Autoimmune Diseases， 2009， 1（4）： 1201-1208.
20	XIAO H， RASUL K， VOLLGRAF R， et al. Fashion-MNIST： a novel image dataset for benchmarking machine learning algorithms ［EB/OL］. ［2022-10-15］. .
21	WEI X， YANG F， WU C. Deep residual networks of residual networks for image super-resolution［C］// Proceedings of the 2017 LIDAR Imaging Detection & Target Recognition， SPIE 10605. Bellingham， WA： SPIE， 2017： 1132-1140.
22	SUN K， WEN X， YUAN L， et al. Dense capsule networks with fewer parameters［J］. Soft Computing， 2021， 25（10）： 6927-6945. 10.1007/s00500-021-05774-6
23	DELIÈGE A， CIOPPA A， DROOGENBROECK M V. HitNet： a neural network with capsules embedded in a Hit-or-Miss layer， extended with hybrid data augmentation and ghost capsules ［EB/OL］. ［2022-10-15］. . 10.48550/arXiv.1806.06519
24	ROSARIO V M D， BORIN E， BRETERNITZ M， Jr. The Multi-Lane Capsule Network （MLCN）［J］. IEEE Signal Processing Letters， 2019， 26（7）： 1006-1010. 10.1109/lsp.2019.2915661
25	XIANG C， LU Z， ZOU W， et al. MS-CapsNet： a novel multi-scale capsule network［J］. IEEE Signal Processing Letters， 2018， 25（12）：1850-1854. 10.1109/lsp.2018.2873892
26	ZHOU D， KANG B， JIN X， et al. DeepViT： towards deeper vision transformer ［EB/OL］. ［2022-10-15］. .

r	CIFAR10	SVHN	FashionMNIST
1	50.23	47.87	64.76
2	65.11	61.13	72.73
3	83.41	90.25	87.46
4	71.14	72.52	74.94
5	75.81	74.59	73.24
6	73.47	65.45	60.26
7	68.76	62.83	53.27
8	65.48	67.84	54.53
9	69.63	54.91	57.17
10	62.72	57.11	49.23

r	CIFAR10	SVHN	FashionMNIST
1	50.23	47.87	64.76
2	65.11	61.13	72.73
3	83.41	90.25	87.46
4	71.14	72.52	74.94
5	75.81	74.59	73.24
6	73.47	65.45	60.26
7	68.76	62.83	53.27
8	65.48	67.84	54.53
9	69.63	54.91	57.17
10	62.72	57.11	49.23

数据集	e-squash	正则化项L₂	CV-CapsNe		DA-CapsNet
数据集	e-squash	正则化项L₂	训练集	测试集	训练集	测试集
CIFAR10			87.29	83.41	89.31	85.47
	√		88.13	83.94	90.01	85.72
		√	88.59	84.76	87.49	86.17
	√	√	89.26	85.48	89.21	88.20
SVHN			91.35	90.25	93.42	94.16
	√		90.95	89.86	93.87	93.25
		√	92.05	91.35	94.17	92.19
	√	√	92.39	90.83	94.68	94.24
FashionMNIST			92.45	87.46	94.12	93.98
	√		92.57	90.14	94.37	92.19
		√	93.04	89.43	92.14	91.36
	√	√	92.68	91.81	94.70	94.12

数据集	e-squash	正则化项L₂	CV-CapsNe		DA-CapsNet
数据集	e-squash	正则化项L₂	训练集	测试集	训练集	测试集
CIFAR10			87.29	83.41	89.31	85.47
	√		88.13	83.94	90.01	85.72
		√	88.59	84.76	87.49	86.17
	√	√	89.26	85.48	89.21	88.20
SVHN			91.35	90.25	93.42	94.16
	√		90.95	89.86	93.87	93.25
		√	92.05	91.35	94.17	92.19
	√	√	92.39	90.83	94.68	94.24
FashionMNIST			92.45	87.46	94.12	93.98
	√		92.57	90.14	94.37	92.19
		√	93.04	89.43	92.14	91.36
	√	√	92.68	91.81	94.70	94.12

N	准确率	参数量/MB	N	准确率	参数量/MB
1	0.513	4.37	6	0.819	17.26
2	0.716	7.26	7	0.851	19.81
3	0.812	10.83	8	0.845	22.68
4	0.824	13.57	9	0.855	26.14
5	0.841	16.35	10	0.860	31.65

面向复杂图像分类的共享转换矩阵胶囊网络

Shared transformation matrix capsule network for complex image classification

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 26

相关文章 15

编辑推荐

Metrics

模型	分类准确率/%			平均参数量/MB
模型	CIFAR10	FashionMNIST	SVHN	平均参数量/MB
DA-CapsNet^［9］	85.47	91.98	94.82	11.75
CV-CapsNet++^［11］	84.21	92.79	91.35	20.69
Quick-CapsNet^［14］	67.18	88.84	86.20	9.81
DenseCapsNet^［22］	86.17	84.39	93.18	1.64
HitNet^［23］	73.30	—	—	—
MLcn2^［24］	75.18	92.63	89.78	10.63
MS-CapsNet^［25］	75.70	92.70	90.34	10.89
STMCapsNet	85.26	93.17	94.96	8.29

[1]	杨昊, 张轶. 基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2727-2734.
[2]	张涵钰, 李振波, 李蔚然, 杨普. 基于机器视觉的水产养殖计数研究综述[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2970-2982.
[3]	袁国龙, 张玉金, 刘洋. 基于残差反馈和自注意力的图像篡改取证网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2925-2931.
[4]	陈俊韬, 朱子奇. 基于多尺度特征提取与融合的图像复制-粘贴伪造检测[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2919-2924.
[5]	王宏, 钱清, 王欢, 龙永. 融合大核注意力卷积的轻量化图像篡改定位算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2692-2699.
[6]	李众, 王雅婧, 马巧梅. 基于空洞卷积的医学图像超分辨率重建算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2940-2947.
[7]	何子仪, 杨燕, 张熠玲. 深度融合多视图聚类网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2651-2656.
[8]	张秋余, 温永旺. 用于语音检索的三联体深度哈希方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2910-2918.
[9]	郭祥, 姜文刚, 王宇航. 基于改进Inception-ResNet的加密流量分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2471-2476.
[10]	崔雨萌, 王靖亚, 刘晓文, 闫尚义, 陶知众. 融合注意力和裁剪机制的通用文本分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2396-2405.
[11]	齐爱玲, 王宣淋. 基于中层细微特征提取与多尺度特征融合细粒度图像识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2556-2563.
[12]	张琨, 杨丰玉, 钟发, 曾广东, 周世健. 基于混合代码表示的源代码脆弱性检测[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2517-2526.
[13]	金泽熙, 李磊, 刘继. 基于改进领域分离网络的迁移学习模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2382-2389.
[14]	王静红, 周志霞, 王辉, 李昊康. 双路自编码器的属性网络表示学习[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2338-2344.
[15]	刘源, 董永权, 贾瑞, 杨昊霖. 面向个性化课程推荐的分层分期注意力网络模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2358-2363.