基于多分支混合注意力的小目标检测算法

doi:10.11772/j.issn.1001-9081.2022111660

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (11): 3579-3586.DOI: 10.11772/j.issn.1001-9081.2022111660

• 多媒体计算与计算机仿真 • 上一篇

基于多分支混合注意力的小目标检测算法

秦强强, 廖俊国(), 周弋荀

湖南科技大学计算机科学与工程学院，湖南湘潭 411201

收稿日期:2022-11-09 修回日期:2023-03-03 接受日期:2023-03-03 发布日期:2023-03-20 出版日期:2023-11-10
通讯作者: 廖俊国
作者简介:秦强强（1997—），男，安徽芜湖人，硕士研究生，CCF会员，主要研究方向：人工智能、目标检测
廖俊国（1972—），男，湖南衡阳人，教授，博士，CCF会员，主要研究方向：网络安全、人工智能、模式识别 jgliao@hnust.edu.cn
周弋荀（1998—），男，湖北黄石人，硕士研究生，CCF会员，主要研究方向：人工智能、目标检测。

Small object detection algorithm based on split mixed attention

Qiangqiang QIN, Junguo LIAO(), Yixun ZHOU

School of Computer Science and Engineering，Hunan University of Science and Technology，Xiangtan Hunan 411201，China

Received:2022-11-09 Revised:2023-03-03 Accepted:2023-03-03 Online:2023-03-20 Published:2023-11-10
Contact: Junguo LIAO
About author:QIN Qiangqiang， born in 1990， M. S. candidate. His research interests include artificial intelligence， object detection.
LIAO Junguo， born in 1972， Ph. D.， professor. Her research interests include cyber security， artificial intelligence， pattern recognition.
ZHOU Yixun， born in 1998， M. S. candidate. His research interests include artificial intelligence， object detection.

摘要/Abstract

摘要：

针对图像中的小目标特征信息少、占比低、易受环境影响等特点，提出一种基于多分支混合注意力的小目标检测算法SMAM-YOLO。首先，将通道注意力（CA）和空间注意力（SA）相结合，重新组合连接结构，提出一种混合注意力模块（MAM），增强模型对小目标特征在空间维度上的表达能力。其次，根据不同大小的感受野对目标影响的不同，基于混合注意力提出一种多分支混合注意力模块（SMAM）；根据输入特征图的尺度自适应调整感受野大小，同时使用混合注意力增强不同分支下对小目标特征信息的捕获能力。最后，使用SMAM改进YOLOv5中的核心残差模块，提出一种基于CSPNet（Cross Stage Partial Network）和SMAM的特征提取模块CSMAM，而且CSMAM的额外计算开销可以忽略不计。在TinyPerson数据集上的实验结果表明，与基线算法YOLOv5s相比，当交并比（IoU）阈值为0.5时，SMAM-YOLO算法的平均检测精度（mAP₅₀）提升了4.15个百分点，且检测速度达到74 frame/s；此外，与现有的一些主流小目标检测模型相比，SMAM-YOLO算法在mAP₅₀上平均提升了1.46~6.84个百分点，且能满足实时性检测的需求。

关键词: 小目标检测, 多分支网络, 混合注意力, 特征融合, 实时检测

Abstract:

Focusing on the characteristics of small objects in images， such as less feature information， low percentage， and easy to be influenced by the environment， a small object detection algorithm based on split mixed attention was proposed， namely SMAM-YOLO. Firstly， by combining Channel Attention （CA） and Spatial Attention （SA）， as well as recombining the connection structures， a Mixed Attention Module （MAM） was proposed to enhance the model’s representation of small object features in spatial dimension. Secondly， according to the different influence of receptive fields with different sizes on the object， a Split Mixed Attention Module （SMAM） was proposed to adaptively adjust the size of the receptive field according to the scale of the input feature map， and the mixed attention was used to enhance the ability to capture small object feature information in different branches. Finally， the core residual module in YOLOv5 was improved by using SMAM， and a feature extraction module CSMAM was proposed on the basis of CSPNet （Cross Stage Partial Network） and SMAM， and the additional computational overhead of CSMAM can be ignored. Experimental results on TinyPerson dataset show that compared with the baseline algorithm YOLOv5s， when the Intersection over Union （IoU） threshold is 0.5， the mean Average Precision （mAP₅₀） of SMAM-YOLO algorithm is improved by 4.15 percentage points， and the detection speed reaches 74 frame/s. In addition， compared with some existing mainstream small object detection models， SMAM-YOLO algorithm improves the mAP₅₀ by 1.46 - 6.84 percentage points on average， and it can meet the requirements of real-time detection.

Key words: small object detection, split network, mixed attention, feature fusion, real-time detection

中图分类号:

TP391

秦强强, 廖俊国, 周弋荀. 基于多分支混合注意力的小目标检测算法[J]. 计算机应用, 2023, 43(11): 3579-3586.

Qiangqiang QIN, Junguo LIAO, Yixun ZHOU. Small object detection algorithm based on split mixed attention[J]. Journal of Computer Applications, 2023, 43(11): 3579-3586.

图/表 12

参考文献 31

1	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 936-944. 10.1109/cvpr.2017.106
2	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
3	GHIASI G， LIN T Y， LE Q V. NAS-FPN： learning scalable feature pyramid architecture for object detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2019： 7029-7038. 10.1109/cvpr.2019.00720
4	LIANG Z， SHAO J， ZHANG D， et al. Small object detection using deep feature pyramid networks［C］// Proceedings of the 2018 Pacific Rim Conference on Multimedia， LNCS 11166. Cham： Springer， 2018： 554-564.
5	TAN M， PANG R， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
6	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2018： 7132-7141. 10.1109/cvpr.2018.00745
7	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
8	QIN Z， ZHANG P， WU F， et al. FcaNet： frequency channel attention networks［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 763-772. 10.1109/iccv48922.2021.00082
9	WANG C Y， LIAO H Y M， WU Y H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington， DC： IEEE Computer Society， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
10	李科岑，王晓强，林浩，等. 深度学习中的单阶段小目标检测方法综述［J］. 计算机科学与探索， 2022， 16（1）：41-58. 10.3778/j.issn.1673-9418.2110003
	LI K C， WANG X Q， LIN H， et al. A survey of one-stage small object detection methods in deep learning［J］. Journal of Frontiers of Computer Science and Technology， 2022， 16（1）： 41-58. 10.3778/j.issn.1673-9418.2110003
11	KISANTAL M， WOJNA Z， MURAWSKI J， et al. Augmentation for small object detection［EB/OL］. ［2023-02-12］.. 10.5121/csit.2019.91713
12	GONG Y， YU X， DING Y， et al. Effective fusion factor in FPN for tiny object detection［C］// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021： 1159-1167. 10.1109/wacv48630.2021.00120
13	JIANG N， YU X， PENG X， et al. SM+： refined scale match for tiny person detection［C］// Proceedings of the 2021 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2021： 1815-1819. 10.1109/icassp39728.2021.9414162
14	李文涛，彭力. 多尺度通道注意力融合网络的小目标检测算法［J］. 计算机科学与探索， 2021， 15（12）：2390-2400.
	LI W T， PENG L. Small objects detection algorithm with multi-scale channel attention fusion network［J］. Journal of Frontiers of Computer Science and Technology， 2021， 15（12）： 2390-2400.
15	SZEGEDY C， LIU W， JIA Y， et al. Going deeper with convolutions［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2015： 1-9. 10.1109/cvpr.2015.7298594
16	XIE S， GIRSHICK R， DOLLÁR P， et al. Aggregated residual transformations for deep neural networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 5987-5995. 10.1109/cvpr.2017.634
17	LI X， WANG W， HU X， et al. Selective kernel networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2019： 510-519. 10.1109/cvpr.2019.00060
18	ZHANG H， WU C， ZHANG Z， et al. ResNeSt： split-attention networks［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington， DC： IEEE Computer Society， 2022： 2735-2745. 10.1109/cvprw56347.2022.00309
19	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the IEEE 2016 Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2016： 779-788. 10.1109/cvpr.2016.91
20	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 1. Cambridge： MIT Press， 2015：91-99.
21	HE K， GKIOSARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
22	曹家乐，李亚利，孙汉卿，等.基于深度学习的视觉目标检测技术综述［J］.中国图象图形学报，2022，27（6）：1697-1722. 10.11834/jig.220069
	CAO J L， LI Y L， SUN H Q， et al. A survey on deep learning based visual object detection［J］. Journal of Image and Graphics， 2022， 27（6）： 1697-1722. 10.11834/jig.220069
23	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 6517-6525. 10.1109/cvpr.2017.690
24	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. ［2023-02-12］.. 10.1109/cvpr.2017.690
25	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2023-02-12］..
26	YU X， GONG Y， JIANG N， et al. Scale match for tiny person detection［C］// Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2020： 1246-1254. 10.1109/wacv45572.2020.9093394
27	LONG X， DENG K， WANG G， et al. PP-YOLO： an effective and efficient implementation of object detector［EB/OL］. ［2023-02-12］.. 10.48550/arXiv.2007.12099
28	ZHU X， SU W， LU L， et al. Deformable DETR： deformable transformers for end-to-end object detection［EB/OL］. ［2023-02-12］..
29	WANG C Y， BOCHKOVSKIY A， LIAO H Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［EB/OL］. ［2023-02-12］.. 10.48550/arXiv.2207.02696
30	GE Z， LIU S， WANG F， et al. YOLOX： exceeding YOLO series in 2021［EB/OL］. ［2023-02-12］..
31	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： visual explanations from deep networks via gradient-based localization［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 618-626. 10.1109/iccv.2017.74

模型	输入分辨率	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS^{1 280}/（frame·s^-1）
YOLOv5s	640×640	7.02	56.81	15.8	32.27	122
	960×960	7.02	57.02	33.9	41.34	122
	1 280×1 280	7.02	57.31	57.3	47.92	122
SMAM-YOLO	640×640	7.37	60.59	19.9	38.16	74
	960×960	7.37	61.44	42.6	45.82	74
	1 280×1 280	7.37	62.62	74.4	52.07	74

模型	输入分辨率	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS^{1 280}/（frame·s^-1）
YOLOv5s	640×640	7.02	56.81	15.8	32.27	122
	960×960	7.02	57.02	33.9	41.34	122
	1 280×1 280	7.02	57.31	57.3	47.92	122
SMAM-YOLO	640×640	7.37	60.59	19.9	38.16	74
	960×960	7.37	61.44	42.6	45.82	74
	1 280×1 280	7.37	62.62	74.4	52.07	74

序号	基线	P2	SMAM	CSMAM	模型层数	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS/（frame·s^-1）
a	√				270	7.02	57.31	57.27	47.92	122
b	√	√			328	7.17	60.63	65.39	49.70	91
c	√	√	√		496	7.62	64.43	76.16	51.71	78
d	√	√	√	√	587	7.37	62.62	74.40	52.07	74

序号	基线	P2	SMAM	CSMAM	模型层数	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS/（frame·s^-1）
a	√				270	7.02	57.31	57.27	47.92	122
b	√	√			328	7.17	60.63	65.39	49.70	91
c	√	√	√		496	7.62	64.43	76.16	51.71	78
d	√	√	√	√	587	7.37	62.62	74.40	52.07	74

模型	参数量/10⁶	模型大小/MB	GFLOPs	mAP₅₀/%	FPS/（frame·s^-1）
CBAM	7.23	60.41	64.56	50.61	77.02
YOLOX-S	9.01	212.23	92.99	47.61	69.98
PP-YOLO-S	7.91	59.16	63.37	48.23	117.08
DETR	41.00	123.65	86.01	46.16	27.90
YOLOv7-tiny	6.02	48.58	47.38	45.23	131.21
YOLOv5s	7.02	60.28	60.28	50.02	63.29
SMAM-YOLO	7.37	62.62	74.42	52.07	74.07

基于多分支混合注意力的小目标检测算法

Small object detection algorithm based on split mixed attention

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 31

相关文章 15

编辑推荐

Metrics

[1]	杨昊, 张轶. 基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2727-2734.
[2]	刘欢, 吴亮红, 张侣, 陈亮, 周博文, 张红强. 基于特征双融合CenterNet的白细胞检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2602-2610.
[3]	徐则林, 杨敏, 陈勐. 融合空间和文本信息的兴趣点类别表征模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2456-2461.
[4]	李豆豆, 李汪根, 夏义春, 束阳, 高坤. 基于特征交互与自适应融合的骨骼动作识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2581-2587.
[5]	梁美佳, 刘昕武, 胡晓鹏. 基于改进YOLOv3的列车运行环境图像小目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2611-2618.
[6]	郑帅, 张晓龙, 邓鹤, 任宏伟. 基于多尺度特征融合和网格注意力机制的三维肝脏影像分割方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2303-2310.
[7]	吕宗喆, 徐慧, 杨骁, 王勇, 王唯鉴. 面向小目标的YOLOv5安全帽检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1943-1949.
[8]	吕学强, 张煜楠, 韩晶, 崔运鹏, 李欢. 融合边特征与注意力的表格结构识别模型[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 752-758.
[9]	王萍, 陈楠, 鲁磊. 基于场景先验及注意力引导的跌倒检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 529-535.
[10]	陈刚, 廖永为, 杨振国, 刘文印. 基于多特征融合的多尺度生成对抗网络图像修复算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 536-544.
[11]	李文举, 张干, 崔柳, 储王慧. 基于坐标注意力的轻量级交通标志识别模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 608-614.
[12]	杨淑莹, 国海铭, 李欣. 基于通道选择和多维特征融合的脑电信号分类[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3418-3427.
[13]	郝雯, 汪洋, 魏海南. 基于多特征融合的点云场景语义分割[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3202-3208.
[14]	刘锁兰, 田珍珍, 王洪元, 林龙, 王炎. 基于单模态的多尺度特征融合人体行为识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3236-3243.
[15]	杨洪刚, 陈洁洁, 徐梦飞. 双线性内卷神经网络用于眼底疾病图像分类[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 259-264.