基于多分支混合注意力的小目标检测算法

doi:10.11772/j.issn.1001-9081.2022111660

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (11): 3579-3586.DOI: 10.11772/j.issn.1001-9081.2022111660

所属专题：多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于多分支混合注意力的小目标检测算法

秦强强, 廖俊国(), 周弋荀

湖南科技大学计算机科学与工程学院，湖南湘潭 411201

收稿日期:2022-11-09 修回日期:2023-03-03 接受日期:2023-03-03 发布日期:2023-03-20 出版日期:2023-11-10
通讯作者: 廖俊国
作者简介:秦强强（1997—），男，安徽芜湖人，硕士研究生，CCF会员，主要研究方向：人工智能、目标检测
廖俊国（1972—），男，湖南衡阳人，教授，博士，CCF会员，主要研究方向：网络安全、人工智能、模式识别 jgliao@hnust.edu.cn
周弋荀（1998—），男，湖北黄石人，硕士研究生，CCF会员，主要研究方向：人工智能、目标检测。

Small object detection algorithm based on split mixed attention

Qiangqiang QIN, Junguo LIAO(), Yixun ZHOU

School of Computer Science and Engineering，Hunan University of Science and Technology，Xiangtan Hunan 411201，China

Received:2022-11-09 Revised:2023-03-03 Accepted:2023-03-03 Online:2023-03-20 Published:2023-11-10
Contact: Junguo LIAO
About author:QIN Qiangqiang， born in 1990， M. S. candidate. His research interests include artificial intelligence， object detection.
LIAO Junguo， born in 1972， Ph. D.， professor. Her research interests include cyber security， artificial intelligence， pattern recognition.
ZHOU Yixun， born in 1998， M. S. candidate. His research interests include artificial intelligence， object detection.

摘要/Abstract

摘要：

针对图像中的小目标特征信息少、占比低、易受环境影响等特点，提出一种基于多分支混合注意力的小目标检测算法SMAM-YOLO。首先，将通道注意力（CA）和空间注意力（SA）相结合，重新组合连接结构，提出一种混合注意力模块（MAM），增强模型对小目标特征在空间维度上的表达能力。其次，根据不同大小的感受野对目标影响的不同，基于混合注意力提出一种多分支混合注意力模块（SMAM）；根据输入特征图的尺度自适应调整感受野大小，同时使用混合注意力增强不同分支下对小目标特征信息的捕获能力。最后，使用SMAM改进YOLOv5中的核心残差模块，提出一种基于CSPNet（Cross Stage Partial Network）和SMAM的特征提取模块CSMAM，而且CSMAM的额外计算开销可以忽略不计。在TinyPerson数据集上的实验结果表明，与基线算法YOLOv5s相比，当交并比（IoU）阈值为0.5时，SMAM-YOLO算法的平均检测精度（mAP₅₀）提升了4.15个百分点，且检测速度达到74 frame/s；此外，与现有的一些主流小目标检测模型相比，SMAM-YOLO算法在mAP₅₀上平均提升了1.46~6.84个百分点，且能满足实时性检测的需求。

关键词: 小目标检测, 多分支网络, 混合注意力, 特征融合, 实时检测

Abstract:

Focusing on the characteristics of small objects in images， such as less feature information， low percentage， and easy to be influenced by the environment， a small object detection algorithm based on split mixed attention was proposed， namely SMAM-YOLO. Firstly， by combining Channel Attention （CA） and Spatial Attention （SA）， as well as recombining the connection structures， a Mixed Attention Module （MAM） was proposed to enhance the model’s representation of small object features in spatial dimension. Secondly， according to the different influence of receptive fields with different sizes on the object， a Split Mixed Attention Module （SMAM） was proposed to adaptively adjust the size of the receptive field according to the scale of the input feature map， and the mixed attention was used to enhance the ability to capture small object feature information in different branches. Finally， the core residual module in YOLOv5 was improved by using SMAM， and a feature extraction module CSMAM was proposed on the basis of CSPNet （Cross Stage Partial Network） and SMAM， and the additional computational overhead of CSMAM can be ignored. Experimental results on TinyPerson dataset show that compared with the baseline algorithm YOLOv5s， when the Intersection over Union （IoU） threshold is 0.5， the mean Average Precision （mAP₅₀） of SMAM-YOLO algorithm is improved by 4.15 percentage points， and the detection speed reaches 74 frame/s. In addition， compared with some existing mainstream small object detection models， SMAM-YOLO algorithm improves the mAP₅₀ by 1.46 - 6.84 percentage points on average， and it can meet the requirements of real-time detection.

Key words: small object detection, split network, mixed attention, feature fusion, real-time detection

中图分类号:

TP391

秦强强, 廖俊国, 周弋荀. 基于多分支混合注意力的小目标检测算法[J]. 计算机应用, 2023, 43(11): 3579-3586.

Qiangqiang QIN, Junguo LIAO, Yixun ZHOU. Small object detection algorithm based on split mixed attention[J]. Journal of Computer Applications, 2023, 43(11): 3579-3586.

图/表 12

参考文献 31

1	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 936-944. 10.1109/cvpr.2017.106
2	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
3	GHIASI G， LIN T Y， LE Q V. NAS-FPN： learning scalable feature pyramid architecture for object detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2019： 7029-7038. 10.1109/cvpr.2019.00720
4	LIANG Z， SHAO J， ZHANG D， et al. Small object detection using deep feature pyramid networks［C］// Proceedings of the 2018 Pacific Rim Conference on Multimedia， LNCS 11166. Cham： Springer， 2018： 554-564.
5	TAN M， PANG R， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
6	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2018： 7132-7141. 10.1109/cvpr.2018.00745
7	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
8	QIN Z， ZHANG P， WU F， et al. FcaNet： frequency channel attention networks［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 763-772. 10.1109/iccv48922.2021.00082
9	WANG C Y， LIAO H Y M， WU Y H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington， DC： IEEE Computer Society， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
10	李科岑，王晓强，林浩，等. 深度学习中的单阶段小目标检测方法综述［J］. 计算机科学与探索， 2022， 16（1）：41-58. 10.3778/j.issn.1673-9418.2110003
	LI K C， WANG X Q， LIN H， et al. A survey of one-stage small object detection methods in deep learning［J］. Journal of Frontiers of Computer Science and Technology， 2022， 16（1）： 41-58. 10.3778/j.issn.1673-9418.2110003
11	KISANTAL M， WOJNA Z， MURAWSKI J， et al. Augmentation for small object detection［EB/OL］. ［2023-02-12］.. 10.5121/csit.2019.91713
12	GONG Y， YU X， DING Y， et al. Effective fusion factor in FPN for tiny object detection［C］// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021： 1159-1167. 10.1109/wacv48630.2021.00120
13	JIANG N， YU X， PENG X， et al. SM+： refined scale match for tiny person detection［C］// Proceedings of the 2021 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2021： 1815-1819. 10.1109/icassp39728.2021.9414162
14	李文涛，彭力. 多尺度通道注意力融合网络的小目标检测算法［J］. 计算机科学与探索， 2021， 15（12）：2390-2400.
	LI W T， PENG L. Small objects detection algorithm with multi-scale channel attention fusion network［J］. Journal of Frontiers of Computer Science and Technology， 2021， 15（12）： 2390-2400.
15	SZEGEDY C， LIU W， JIA Y， et al. Going deeper with convolutions［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2015： 1-9. 10.1109/cvpr.2015.7298594
16	XIE S， GIRSHICK R， DOLLÁR P， et al. Aggregated residual transformations for deep neural networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 5987-5995. 10.1109/cvpr.2017.634
17	LI X， WANG W， HU X， et al. Selective kernel networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2019： 510-519. 10.1109/cvpr.2019.00060
18	ZHANG H， WU C， ZHANG Z， et al. ResNeSt： split-attention networks［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington， DC： IEEE Computer Society， 2022： 2735-2745. 10.1109/cvprw56347.2022.00309
19	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the IEEE 2016 Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2016： 779-788. 10.1109/cvpr.2016.91
20	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 1. Cambridge： MIT Press， 2015：91-99.
21	HE K， GKIOSARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
22	曹家乐，李亚利，孙汉卿，等.基于深度学习的视觉目标检测技术综述［J］.中国图象图形学报，2022，27（6）：1697-1722. 10.11834/jig.220069
	CAO J L， LI Y L， SUN H Q， et al. A survey on deep learning based visual object detection［J］. Journal of Image and Graphics， 2022， 27（6）： 1697-1722. 10.11834/jig.220069
23	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 6517-6525. 10.1109/cvpr.2017.690
24	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. ［2023-02-12］.. 10.1109/cvpr.2017.690
25	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2023-02-12］..
26	YU X， GONG Y， JIANG N， et al. Scale match for tiny person detection［C］// Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2020： 1246-1254. 10.1109/wacv45572.2020.9093394
27	LONG X， DENG K， WANG G， et al. PP-YOLO： an effective and efficient implementation of object detector［EB/OL］. ［2023-02-12］.. 10.48550/arXiv.2007.12099
28	ZHU X， SU W， LU L， et al. Deformable DETR： deformable transformers for end-to-end object detection［EB/OL］. ［2023-02-12］..
29	WANG C Y， BOCHKOVSKIY A， LIAO H Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［EB/OL］. ［2023-02-12］.. 10.48550/arXiv.2207.02696
30	GE Z， LIU S， WANG F， et al. YOLOX： exceeding YOLO series in 2021［EB/OL］. ［2023-02-12］..
31	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： visual explanations from deep networks via gradient-based localization［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 618-626. 10.1109/iccv.2017.74

模型	输入分辨率	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS^{1 280}/（frame·s^-1）
YOLOv5s	640×640	7.02	56.81	15.8	32.27	122
	960×960	7.02	57.02	33.9	41.34	122
	1 280×1 280	7.02	57.31	57.3	47.92	122
SMAM-YOLO	640×640	7.37	60.59	19.9	38.16	74
	960×960	7.37	61.44	42.6	45.82	74
	1 280×1 280	7.37	62.62	74.4	52.07	74

模型	输入分辨率	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS^{1 280}/（frame·s^-1）
YOLOv5s	640×640	7.02	56.81	15.8	32.27	122
	960×960	7.02	57.02	33.9	41.34	122
	1 280×1 280	7.02	57.31	57.3	47.92	122
SMAM-YOLO	640×640	7.37	60.59	19.9	38.16	74
	960×960	7.37	61.44	42.6	45.82	74
	1 280×1 280	7.37	62.62	74.4	52.07	74

序号	基线	P2	SMAM	CSMAM	模型层数	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS/（frame·s^-1）
a	√				270	7.02	57.31	57.27	47.92	122
b	√	√			328	7.17	60.63	65.39	49.70	91
c	√	√	√		496	7.62	64.43	76.16	51.71	78
d	√	√	√	√	587	7.37	62.62	74.40	52.07	74

序号	基线	P2	SMAM	CSMAM	模型层数	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS/（frame·s^-1）
a	√				270	7.02	57.31	57.27	47.92	122
b	√	√			328	7.17	60.63	65.39	49.70	91
c	√	√	√		496	7.62	64.43	76.16	51.71	78
d	√	√	√	√	587	7.37	62.62	74.40	52.07	74

模型	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS/（frame·s^-1）
CBAM	7.23	60.41	64.56	50.61	77.02
YOLOX-S	9.01	212.23	92.99	47.61	69.98
PP-YOLO-S	7.91	59.16	63.37	48.23	117.08
DETR	41.00	123.65	86.01	46.16	27.90
YOLOv7-tiny	6.02	48.58	47.38	45.23	131.21
YOLOv5s	7.02	60.28	60.28	50.02	63.29
SMAM-YOLO	7.37	62.62	74.42	52.07	74.07

基于多分支混合注意力的小目标检测算法

Small object detection algorithm based on split mixed attention

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 31

相关文章 15

编辑推荐

Metrics

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587.
[3]	姬张建, 杜娜. 基于改进VariFocalNet的微小目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2200-2207.
[4]	刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257.
[5]	刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977.
[6]	邓亚平, 李迎江. YOLO算法及其在自动驾驶场景中目标检测综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1949-1958.
[7]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[8]	韩贵金, 张馨渊, 张文涛, 黄娅. 基于多特征融合的自监督图像配准算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1597-1604.
[9]	李鸿天, 史鑫昊, 潘卫国, 徐成, 徐冰心, 袁家政. 融合多尺度和注意力机制的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1437-1444.
[10]	李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1619-1628.
[11]	王昊冉, 于丹, 杨玉丽, 马垚, 陈永乐. 面向工控系统未知攻击的域迁移入侵检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1158-1165.
[12]	贾宗泽, 高鹏飞, 马应龙, 刘晓峰, 夏海鑫. 基于注意力机制的多特征融合对话行为层次化分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 715-721.
[13]	蒋占军, 吴佰靖, 马龙, 廉敬. 多尺度特征和极化自注意力的Faster-RCNN水漂垃圾识别[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 938-944.
[14]	李新叶, 侯晔凝, 孔英会, 燕志旗. 结合特征融合与增强注意力的少样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 745-751.
[15]	吴宁, 罗杨洋, 许华杰. 基于多尺度特征融合的遥感图像语义分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 737-744.