Small object detection algorithm based on split mixed attention

doi:10.11772/j.issn.1001-9081.2022111660

Abstract

Abstract:

Focusing on the characteristics of small objects in images， such as less feature information， low percentage， and easy to be influenced by the environment， a small object detection algorithm based on split mixed attention was proposed， namely SMAM-YOLO. Firstly， by combining Channel Attention （CA） and Spatial Attention （SA）， as well as recombining the connection structures， a Mixed Attention Module （MAM） was proposed to enhance the model’s representation of small object features in spatial dimension. Secondly， according to the different influence of receptive fields with different sizes on the object， a Split Mixed Attention Module （SMAM） was proposed to adaptively adjust the size of the receptive field according to the scale of the input feature map， and the mixed attention was used to enhance the ability to capture small object feature information in different branches. Finally， the core residual module in YOLOv5 was improved by using SMAM， and a feature extraction module CSMAM was proposed on the basis of CSPNet （Cross Stage Partial Network） and SMAM， and the additional computational overhead of CSMAM can be ignored. Experimental results on TinyPerson dataset show that compared with the baseline algorithm YOLOv5s， when the Intersection over Union （IoU） threshold is 0.5， the mean Average Precision （mAP₅₀） of SMAM-YOLO algorithm is improved by 4.15 percentage points， and the detection speed reaches 74 frame/s. In addition， compared with some existing mainstream small object detection models， SMAM-YOLO algorithm improves the mAP₅₀ by 1.46 - 6.84 percentage points on average， and it can meet the requirements of real-time detection.

Key words: small object detection, split network, mixed attention, feature fusion, real-time detection

摘要：

针对图像中的小目标特征信息少、占比低、易受环境影响等特点，提出一种基于多分支混合注意力的小目标检测算法SMAM-YOLO。首先，将通道注意力（CA）和空间注意力（SA）相结合，重新组合连接结构，提出一种混合注意力模块（MAM），增强模型对小目标特征在空间维度上的表达能力。其次，根据不同大小的感受野对目标影响的不同，基于混合注意力提出一种多分支混合注意力模块（SMAM）；根据输入特征图的尺度自适应调整感受野大小，同时使用混合注意力增强不同分支下对小目标特征信息的捕获能力。最后，使用SMAM改进YOLOv5中的核心残差模块，提出一种基于CSPNet（Cross Stage Partial Network）和SMAM的特征提取模块CSMAM，而且CSMAM的额外计算开销可以忽略不计。在TinyPerson数据集上的实验结果表明，与基线算法YOLOv5s相比，当交并比（IoU）阈值为0.5时，SMAM-YOLO算法的平均检测精度（mAP₅₀）提升了4.15个百分点，且检测速度达到74 frame/s；此外，与现有的一些主流小目标检测模型相比，SMAM-YOLO算法在mAP₅₀上平均提升了1.46~6.84个百分点，且能满足实时性检测的需求。

关键词: 小目标检测, 多分支网络, 混合注意力, 特征融合, 实时检测

CLC Number:

TP391

Qiangqiang QIN, Junguo LIAO, Yixun ZHOU. Small object detection algorithm based on split mixed attention[J]. Journal of Computer Applications, 2023, 43(11): 3579-3586.

秦强强, 廖俊国, 周弋荀. 基于多分支混合注意力的小目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3579-3586.

Figures/Tables 12

References 31

1	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 936-944. 10.1109/cvpr.2017.106
2	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
3	GHIASI G， LIN T Y， LE Q V. NAS-FPN： learning scalable feature pyramid architecture for object detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2019： 7029-7038. 10.1109/cvpr.2019.00720
4	LIANG Z， SHAO J， ZHANG D， et al. Small object detection using deep feature pyramid networks［C］// Proceedings of the 2018 Pacific Rim Conference on Multimedia， LNCS 11166. Cham： Springer， 2018： 554-564.
5	TAN M， PANG R， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
6	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2018： 7132-7141. 10.1109/cvpr.2018.00745
7	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
8	QIN Z， ZHANG P， WU F， et al. FcaNet： frequency channel attention networks［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 763-772. 10.1109/iccv48922.2021.00082
9	WANG C Y， LIAO H Y M， WU Y H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington， DC： IEEE Computer Society， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
10	李科岑，王晓强，林浩，等. 深度学习中的单阶段小目标检测方法综述［J］. 计算机科学与探索， 2022， 16（1）：41-58. 10.3778/j.issn.1673-9418.2110003
	LI K C， WANG X Q， LIN H， et al. A survey of one-stage small object detection methods in deep learning［J］. Journal of Frontiers of Computer Science and Technology， 2022， 16（1）： 41-58. 10.3778/j.issn.1673-9418.2110003
11	KISANTAL M， WOJNA Z， MURAWSKI J， et al. Augmentation for small object detection［EB/OL］. ［2023-02-12］.. 10.5121/csit.2019.91713
12	GONG Y， YU X， DING Y， et al. Effective fusion factor in FPN for tiny object detection［C］// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021： 1159-1167. 10.1109/wacv48630.2021.00120
13	JIANG N， YU X， PENG X， et al. SM+： refined scale match for tiny person detection［C］// Proceedings of the 2021 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2021： 1815-1819. 10.1109/icassp39728.2021.9414162
14	李文涛，彭力. 多尺度通道注意力融合网络的小目标检测算法［J］. 计算机科学与探索， 2021， 15（12）：2390-2400.
	LI W T， PENG L. Small objects detection algorithm with multi-scale channel attention fusion network［J］. Journal of Frontiers of Computer Science and Technology， 2021， 15（12）： 2390-2400.
15	SZEGEDY C， LIU W， JIA Y， et al. Going deeper with convolutions［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2015： 1-9. 10.1109/cvpr.2015.7298594
16	XIE S， GIRSHICK R， DOLLÁR P， et al. Aggregated residual transformations for deep neural networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 5987-5995. 10.1109/cvpr.2017.634
17	LI X， WANG W， HU X， et al. Selective kernel networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2019： 510-519. 10.1109/cvpr.2019.00060
18	ZHANG H， WU C， ZHANG Z， et al. ResNeSt： split-attention networks［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington， DC： IEEE Computer Society， 2022： 2735-2745. 10.1109/cvprw56347.2022.00309
19	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the IEEE 2016 Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2016： 779-788. 10.1109/cvpr.2016.91
20	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 1. Cambridge： MIT Press， 2015：91-99.
21	HE K， GKIOSARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
22	曹家乐，李亚利，孙汉卿，等.基于深度学习的视觉目标检测技术综述［J］.中国图象图形学报，2022，27（6）：1697-1722. 10.11834/jig.220069
	CAO J L， LI Y L， SUN H Q， et al. A survey on deep learning based visual object detection［J］. Journal of Image and Graphics， 2022， 27（6）： 1697-1722. 10.11834/jig.220069
23	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 6517-6525. 10.1109/cvpr.2017.690
24	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. ［2023-02-12］.. 10.1109/cvpr.2017.690
25	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2023-02-12］..
26	YU X， GONG Y， JIANG N， et al. Scale match for tiny person detection［C］// Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2020： 1246-1254. 10.1109/wacv45572.2020.9093394
27	LONG X， DENG K， WANG G， et al. PP-YOLO： an effective and efficient implementation of object detector［EB/OL］. ［2023-02-12］.. 10.48550/arXiv.2007.12099
28	ZHU X， SU W， LU L， et al. Deformable DETR： deformable transformers for end-to-end object detection［EB/OL］. ［2023-02-12］..
29	WANG C Y， BOCHKOVSKIY A， LIAO H Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［EB/OL］. ［2023-02-12］.. 10.48550/arXiv.2207.02696
30	GE Z， LIU S， WANG F， et al. YOLOX： exceeding YOLO series in 2021［EB/OL］. ［2023-02-12］..
31	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： visual explanations from deep networks via gradient-based localization［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 618-626. 10.1109/iccv.2017.74

模型	输入分辨率	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS^{1 280}/（frame·s^-1）
YOLOv5s	640×640	7.02	56.81	15.8	32.27	122
	960×960	7.02	57.02	33.9	41.34	122
	1 280×1 280	7.02	57.31	57.3	47.92	122
SMAM-YOLO	640×640	7.37	60.59	19.9	38.16	74
	960×960	7.37	61.44	42.6	45.82	74
	1 280×1 280	7.37	62.62	74.4	52.07	74

模型	输入分辨率	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS^{1 280}/（frame·s^-1）
YOLOv5s	640×640	7.02	56.81	15.8	32.27	122
	960×960	7.02	57.02	33.9	41.34	122
	1 280×1 280	7.02	57.31	57.3	47.92	122
SMAM-YOLO	640×640	7.37	60.59	19.9	38.16	74
	960×960	7.37	61.44	42.6	45.82	74
	1 280×1 280	7.37	62.62	74.4	52.07	74

序号	基线	P2	SMAM	CSMAM	模型层数	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS/（frame·s^-1）
a	√				270	7.02	57.31	57.27	47.92	122
b	√	√			328	7.17	60.63	65.39	49.70	91
c	√	√	√		496	7.62	64.43	76.16	51.71	78
d	√	√	√	√	587	7.37	62.62	74.40	52.07	74

序号	基线	P2	SMAM	CSMAM	模型层数	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS/（frame·s^-1）
a	√				270	7.02	57.31	57.27	47.92	122
b	√	√			328	7.17	60.63	65.39	49.70	91
c	√	√	√		496	7.62	64.43	76.16	51.71	78
d	√	√	√	√	587	7.37	62.62	74.40	52.07	74

模型	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS/（frame·s^-1）
CBAM	7.23	60.41	64.56	50.61	77.02
YOLOX-S	9.01	212.23	92.99	47.61	69.98
PP-YOLO-S	7.91	59.16	63.37	48.23	117.08
DETR	41.00	123.65	86.01	46.16	27.90
YOLOv7-tiny	6.02	48.58	47.38	45.23	131.21
YOLOv5s	7.02	60.28	60.28	50.02	63.29
SMAM-YOLO	7.37	62.62	74.42	52.07	74.07