基于注意力机制和上下文信息的目标检测算法

doi:10.11772/j.issn.1001-9081.2022040554

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (5): 1557-1564.DOI: 10.11772/j.issn.1001-9081.2022040554

• 多媒体计算与计算机仿真 • 上一篇

基于注意力机制和上下文信息的目标检测算法

刘辉¹^,², 张琳玉¹^,²(), 王复港¹^,², 何如瑾¹^,²

^1.重庆邮电大学通信与信息工程学院，重庆 400065
^2.重庆邮电大学数智化通信新技术应用研究中心，重庆 400065

收稿日期:2022-04-19 修回日期:2022-06-20 接受日期:2022-06-22 发布日期:2022-07-11 出版日期:2023-05-10
通讯作者: 张琳玉
作者简介:刘辉（1966—），男，四川仪陇人，高级工程师，硕士，主要研究方向：计算机视觉、通信网络新技术、电信系统业务
张琳玉（1997—），女，河北石家庄人，硕士研究生，主要研究方向：目标检测 1075634172@qq.com
王复港（1997—），男，山东泰安人，硕士研究生，主要研究方向：目标检测
何如瑾（1998—），女，湖南邵阳人，硕士研究生，主要研究方向：异常行为识别。

Object detection algorithm based on attention mechanism and context information

Hui LIU¹^,², Linyu ZHANG¹^,²(), Fugang WANG¹^,², Rujin HE¹^,²

^1.School of Communication and Information Engineering，Chongqing University of Posts and Telecommunications，Chongqing 400065，China
^2.Digital Intelligence Communication New Technology Application Research Center，Chongqing University of Posts and Telecommunications，Chongqing 400065，China

Received:2022-04-19 Revised:2022-06-20 Accepted:2022-06-22 Online:2022-07-11 Published:2023-05-10
Contact: Linyu ZHANG
About author:LIU Hui， born in 1966， M. S.， senior engineer. His research interests include computer vision， new technology of communication network， telecommunication system service.
ZHANG Linyu， born in 1997， M. S. candidate. Her research interests include object detection.
WANG Fugang， born in 1997， M. S. candidate. His research interests include object detection.
HE Rujin， born in 1998， M. S. candidate. Her research interests include abnormal behavior detection.

摘要/Abstract

摘要：

针对目标检测过程中存在的小目标漏检问题，提出一种基于注意力机制和多尺度上下文信息的改进YOLOv5目标检测算法。首先，在特征提取结构中加入多尺度空洞可分离卷积模块（MDSCM）以提取多尺度特征信息，在增大感受野的同时避免小目标信息的丢失；其次，在主干网络中添加注意力机制，并在通道信息中嵌入位置感知信息，进一步增强算法的特征表达能力；最后，使用Soft-NMS（Soft-Non-Maximum Suppression）代替YOLOv5使用的非极大值抑制（NMS），降低检测算法的漏检率。实验结果表明，改进算法在PASCAL VOC数据集、DOTA航拍数据集和DIOR光学遥感数据集上的检测精度分别达到了82.80%、71.74%和77.11%，相较于YOLOv5，分别提高了3.70、1.49和2.48个百分点；而且它对图像中小目标的检测效果更好。因此，改进的YOLOv5可以更好地应用到小目标检测场景中。

关键词: 目标检测, 深度可分离卷积, 空洞卷积, 注意力机制, 非极大值抑制

Abstract:

Aiming at the problem of small object miss detection in object detection process， an improved YOLOv5 （You Only Look Once） object detection algorithm based on attention mechanism and multi-scale context information was proposed. Firstly， Multiscale Dilated Separable Convolutional Module （MDSCM） was added to the feature extraction structure to extract multi-scale feature information， increasing the receptive field while avoiding the loss of small object information. Secondly， the attention mechanism was added to the backbone network， and the location awareness information was embedded in the channel information， so as to further enhance the feature expression ability of the algorithm. Finally， Soft-NMS （Soft-Non-Maximum Suppression） was used instead of the NMS （Non-Maximum Suppression） used by YOLOv5 to reduce the missed detection rate of the algorithm. Experimental results show that the improved algorithm achieves detection precisions of 82.80%， 71.74% and 77.11% respectively on PASCAL VOC dataset， DOTA aerial image dataset and DIOR optical remote sensing dataset， which are 3.70， 1.49 and 2.48 percentage points higer than those of YOLOv5， and it has better detection effect on small objects. Therefore， the improved YOLOv5 can be better applied to small object detection scenarios in practice.

Key words: object detection, depthwise separable convolution, dilated convolution, attention mechanism, Non-Maximum Suppression (NMS)

中图分类号:

TP391.41

刘辉, 张琳玉, 王复港, 何如瑾. 基于注意力机制和上下文信息的目标检测算法[J]. 计算机应用, 2023, 43(5): 1557-1564.

Hui LIU, Linyu ZHANG, Fugang WANG, Rujin HE. Object detection algorithm based on attention mechanism and context information[J]. Journal of Computer Applications, 2023, 43(5): 1557-1564.

图/表 13

参考文献 32

1	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceeding of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014：580-587. 10.1109/cvpr.2014.81
2	GIRSHICK R. Fast R-CNN［C］// Proceeding of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：1440-1448. 10.1109/iccv.2015.169
3	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
4	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
5	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2022-04-18］.. 10.1109/cvpr.2017.690
6	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multiBox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
7	ZHANG S， WEN L， BIAN X， et al. Single-shot refinement neural network for object detection［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4203-4212. 10.1109/cvpr.2018.00442
8	LIM J S， ASTRID M， YOON H J， et al. Small object detection using context and attention［C］// Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication. Piscataway： IEEE， 2021： 181-186. 10.1109/icaiic51459.2021.9415217
9	许腾，唐贵进，刘清萍，等. 基于空洞卷积和Focal Loss的改进YOLOv3算法［J］. 南京邮电大学学报（自然科学版）， 2020， 40（6）：100-108. 10.14132/j.cnki.1673-5439.2020.06.015
	XU T， TANG G J， LIU Q P， et al. Improved YOLOv3 based on dilated convolution and Focal Loss［J］. Journal of Nanjing University of Posts and Telecommunications （Natural Science Edition）， 2020， 40（6）：100-108. 10.14132/j.cnki.1673-5439.2020.06.015
10	CAO G M， XIE X M， YANG W Z， et al. Feature-fused SSD： fast detection for small objects［C］// Proceedings of the 9th International Conference on Graphic and Image Processing. Bellingham， WA： SPIE， 2018： No.106151E. 10.1117/12.2304811
11	董如婵，焦李成，赵进，等. 一种深度融合机制的遥感图像目标检测技术［J］. 西安电子科技大学学报， 2021， 48（5）： 128-138.
	DONG R C， JIAO L C， ZHAO J， et al. Application of the deep fusion mechanism in object detection of remote sensing images［J］. Journal of Xidian University， 2021， 48（5）：128-138.
12	马巧梅，王明俊，梁昊然. 复杂场景下基于改进YOLOv3的车牌定位检测算法［J］. 计算机工程与应用， 2021， 57（7）：198-208. 10.3778/j.issn.1002-8331.2008-0137
	MA Q M， WANG M J， LIANG H R. License plate location detection algorithm based on improved YOLOv3 in complex scenes［J］. Computer Engineering and Applications， 2021， 57（7）：198-208. 10.3778/j.issn.1002-8331.2008-0137
13	ZHAO H S， SHI J P， QI X J， et al. Pyramid scene parsing network［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6230-6239. 10.1109/cvpr.2017.660
14	ZHANG F， JIAO L C， LI L L， et al. Multiresolution attention extractor for small object detection［EB/OL］. （2020-06-10）［2022-04-18］.. 10.48550/arXiv.2006.05941
15	Ultralytics. YOLOv5［EB/OL］. ［2022-01-18］.. 10.1117/1.jei.31.3.033033
16	XU B Q， JIANG G W， LIU J H， et al. Aircraft rotated boxes detection method based on YOLOv5［C］// Proceedings of the 4th International Conference on Pattern Recognition and Artificial Intelligence. Piscataway： IEEE， 2021： 390-394. 10.1109/prai53619.2021.9551072
17	WANG C Y， LIAO H Y M， WU Y H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
18	HE K M， ZHANG X Y， REN S Q， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916. 10.1109/tpami.2015.2389824
19	LIN T Y， DOLLÁR P， GIRSHISICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
20	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
21	CHEN Q， WANG Y M， YANG T， et al. You only look one-level feature［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13034-13043. 10.1109/cvpr46437.2021.01284
22	CHEN L C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation［EB/OL］. （2017-12-05）［2022-04-06］.. 10.1007/978-3-030-01234-2_49
23	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
24	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
25	HOU Q， ZHOU D， FENG J. Coordinate attention for efficient mobile network design［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13713-13722. 10.1109/cvpr46437.2021.01350
26	JIANG B R， LUO R X， MAO J Y， et al. Acquisition of localization confidence for accurate object detection［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11218. Cham： Springer， 2018： 816-832.
27	NING C C， ZHOU H J， SONG Y， et al. Inception single shot multiBox detector for object detection［C］// Proceedings of the 2017 IEEE International Conference on Multimedia and Expo Workshops. Piscataway： IEEE， 2017： 549-554. 10.1109/icmew.2017.8026312
28	BODLA N， SINGH B， CHELLAPPA R， et al. Soft-NMS - improving object detection with one line of code［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 5562-5570. 10.1109/iccv.2017.593
29	WANG P Q， CHEN P F， YUAN Y， et al. Understanding convolution for semantic segmentation［C］// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2018： 1451-1460. 10.1109/wacv.2018.00163
30	EVERINGHAM M， ESLAMI S M A， van GOOL L， et al. The PASCAL Visual Object Classes challenge： a retrospective［J］. International Journal of Computer Vision， 2015， 111（1）：98-136. 10.1007/s11263-014-0733-5
31	XIA G S， BAI X， DING J， et al. DOTA： a large-scale dataset for object detection in aerial images［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 3974-3983. 10.1109/cvpr.2018.00418
32	LI K， WAN G， CHENG G， et al. Object detection in optical remote sensing images： a survey and a new benchmark［J］. ISPRS Journal of Photogrammetry and Remote Sensing， 2020， 159： 296-307. 10.1016/j.isprsjprs.2019.11.023

配置项	训练	测试
编程语言	Python	Python
深度学习框架	Pytorch1.8.0	Pytorch1.8.0
操作系统	Windows 10	Windows 10
CPU	Core i9-10980XE	Core i5-11400F
内存	128 GB	16 GB
GPU	Nvidia RTX 3080	Nvidia RTX3060
CUDA	11.1	11.1

配置项	训练	测试
编程语言	Python	Python
深度学习框架	Pytorch1.8.0	Pytorch1.8.0
操作系统	Windows 10	Windows 10
CPU	Core i9-10980XE	Core i5-11400F
内存	128 GB	16 GB
GPU	Nvidia RTX 3080	Nvidia RTX3060
CUDA	11.1	11.1

算法	mAP/%	FPS
YOLOv5	79.10	108
YOLOv5+MDSCM	80.00	91
YOLOv5+CA	81.10	108
YOLOv5+GCA	81.40	108
YOLOv5+Soft-NMS	79.60	104
YOLOv5+MDSCM+GCA	81.70	91
YOLOv5+MDSCM+Soft-NMS	80.90	90
YOLOv5+GCA+Soft-NMS	82.00	106
AC-YOLO	82.80	90

算法	mAP/%	FPS
YOLOv5	79.10	108
YOLOv5+MDSCM	80.00	91
YOLOv5+CA	81.10	108
YOLOv5+GCA	81.40	108
YOLOv5+Soft-NMS	79.60	104
YOLOv5+MDSCM+GCA	81.70	91
YOLOv5+MDSCM+Soft-NMS	80.90	90
YOLOv5+GCA+Soft-NMS	82.00	106
AC-YOLO	82.80	90

网络	尺寸大小	mAP/%	FPS
Faster RCNN	640×640	73.32	5
SSD	640×640	77.66	54
YOLOv3	640×640	72.34	60
Tiny-YOLOv3	640×640	73.28	91
YOLOv5	640×640	79.10	108
AC-YOLO	640×640	82.80	90

基于注意力机制和上下文信息的目标检测算法

Object detection algorithm based on attention mechanism and context information

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 32

相关文章 15

编辑推荐

Metrics

类别	AP（IoU=0.5）
类别	YOLOv3	SSD	YOLOv5	本文算法
Aero	81.20	75.50	87.70	89.20
Bike	80.30	80.20	89.30	91.00
Bird	74.00	72.30	74.30	80.80
Boat	65.50	66.30	70.80	73.90
Bottle	64.10	47.60	71.60	71.80
Bus	81.50	83.00	85.50	89.60
Car	82.20	84.20	91.70	92.00
Cat	83.10	86.10	83.20	89.70
Chair	61.23	54.70	61.90	67.70
Cow	77.30	78.30	82.00	85.90
Table	75.20	73.90	73.80	77.50
Dog	82.20	84.50	81.00	88.00
Horse	84.69	85.30	87.90	91.40
Mbike	81.29	82.60	86.60	89.10
Person	78.46	76.20	86.60	88.50
Plant	52.18	48.60	52.40	57.80
Sheep	77.52	73.90	81.70	84.70
Sofa	74.41	76.00	70.80	78.20
Train	81.66	83.40	83.50	87.30
TV	71.99	74.00	79.80	82.00

[1]	张凯, 覃正楚, 刘月, 秦心怡. 多学习行为协同的知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1422-1429.
[2]	石利锋, 倪郑威. 基于槽位相关信息提取的对话状态追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1430-1437.
[3]	蒋瑞林, 覃仁超. 基于深度可分离卷积的多神经网络恶意代码检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1527-1533.
[4]	郝巨鸣, 杨景玉, 韩淑梅, 王阳萍. 引入Ghost模块和ECA的YOLOv4公路路面裂缝检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1284-1290.
[5]	袁泉, 徐雲鹏, 唐成亮. 基于路径标签的文档级关系抽取方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1029-1035.
[6]	祖佳贞, 周永霞, 陈乐. 结合注意力的双分支残差低光照图像增强[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1240-1247.
[7]	窦光义, 魏发南, 邱创一, 巢建树. 基于注意力自相关机制的跟踪外观特征[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1248-1254.
[8]	孙浩, 曹健, 李海生, 毛典辉. 基于改进胶囊网络的会话型推荐模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1043-1049.
[9]	朱周华, 齐琦. 基于改进YOLOv5s电动车头盔的自动检测与识别[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1291-1296.
[10]	陈路, 陈道喜, 陆一鸣, 陆卫忠. 基于注意力机制编码器‒解码器的手写数学公式识别模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1297-1302.
[11]	孙杰, 吴绍鑫, 王学军, 华璟. 基于Sophon SC5+芯片构架的行人搜索算法与优化[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 744-751.
[12]	顾勇翔, 蓝鑫, 伏博毅, 秦小林. 基于几何适应与全局感知的遥感图像目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 916-922.
[13]	杨有, 张汝荟, 许鹏程, 康慷, 翟浩. 面向民国档案印章分割的改进U-Net[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 943-948.
[14]	何雪东, 宣士斌, 王款, 陈梦楠. 融合累积分布函数和通道注意力机制的DeepLabV3+图像分割算法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 936-942.
[15]	李海丰, 张凡, 朴敏楠, 王怀超, 李南莎, 桂仲成. 基于通道和空间注意力的机场道面地下目标自动检测[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 930-935.

类别	AP（IoU=0.5）
类别	YOLOv3	SSD	YOLOv5	本文算法
mAP	67.97	41.98	70.25	71.74
Small- vehicle	66.80	10.05	66.30	67.80
Large-vehicle	81.70	50.20	83.60	85.90
Plane	86.20	64.70	90.80	91.80
Storage-tank	69.90	57.90	69.70	74.90
Ship	84.30	31.30	86.80	88.10
Harbor	80.50	80.50	84.00	82.20
Ground track-field	56.70	24.90	61.90	59.30
Soccer ball field	51.70	22.70	52.50	55.60
Tennis-court	89.40	85.50	94.00	93.40
Swimming pool	60.30	18.50	62.90	64.10
Baseball diamond	73.60	38.20	76.90	74.00
Roundabout	50.30	44.50	58.00	59.40
Basketball court	60.40	62.50	64.40	66.20
Bridge	46，10	26.20	47.80	50.40
Helicopter	51.80	12.10	54.20	63.00

类别	AP（IoU=0.5）
类别	YOLOv3	SSD	YOLOv5	本文算法
mAP	67.97	41.98	70.25	71.74
Small- vehicle	66.80	10.05	66.30	67.80
Large-vehicle	81.70	50.20	83.60	85.90
Plane	86.20	64.70	90.80	91.80
Storage-tank	69.90	57.90	69.70	74.90
Ship	84.30	31.30	86.80	88.10
Harbor	80.50	80.50	84.00	82.20
Ground track-field	56.70	24.90	61.90	59.30
Soccer ball field	51.70	22.70	52.50	55.60
Tennis-court	89.40	85.50	94.00	93.40
Swimming pool	60.30	18.50	62.90	64.10
Baseball diamond	73.60	38.20	76.90	74.00
Roundabout	50.30	44.50	58.00	59.40
Basketball court	60.40	62.50	64.40	66.20
Bridge	46，10	26.20	47.80	50.40
Helicopter	51.80	12.10	54.20	63.00

类别	AP（IoU=0.5）
类别	YOLOv3	SSD	YOLOv5	本文算法
mAP	58.63	51.58	74.63	77.11
Airplane	59.60	49.40	89.10	93.10
Airport	72.70	63.10	78.10	80.90
Baseball field	73.40	66.60	81.90	79.90
Basketball court	75.70	71.10	80.00	84.40
Bridge	29.70	26.50	69.20	76.00
Chimney	65.60	63.30	89.70	81.70
Dam	56.60	54.30	73.10	77.10
Expressway service area	63.50	62.70	70.50	67.60
Expressway toll station	53.10	46.60	58.50	70.00
Golf course	65.30	64.40	70.30	66.70
Ground track field	68.60	53.10	66.20	75.70
Harbor	49.40	44.20	69.30	75.50
Overpass	48.10	35.70	78.80	76.70
Ship	59.20	58.30	80.30	87.00
Stadium	61.00	41.10	60.90	65.80
Storage tank	46.60	72.60	70.40	70.10
Tennis court	76.30	37.50	85.20	88.70
Train station	55.10	22.70	66.80	63.50
Vehicle	27.40	47.10	76.70	81.20
Wind mill	65.70	51.20	77.50	80.50

类别	AP（IoU=0.5）
类别	YOLOv3	SSD	YOLOv5	本文算法
mAP	58.63	51.58	74.63	77.11
Airplane	59.60	49.40	89.10	93.10
Airport	72.70	63.10	78.10	80.90
Baseball field	73.40	66.60	81.90	79.90
Basketball court	75.70	71.10	80.00	84.40
Bridge	29.70	26.50	69.20	76.00
Chimney	65.60	63.30	89.70	81.70
Dam	56.60	54.30	73.10	77.10
Expressway service area	63.50	62.70	70.50	67.60
Expressway toll station	53.10	46.60	58.50	70.00
Golf course	65.30	64.40	70.30	66.70
Ground track field	68.60	53.10	66.20	75.70
Harbor	49.40	44.20	69.30	75.50
Overpass	48.10	35.70	78.80	76.70
Ship	59.20	58.30	80.30	87.00
Stadium	61.00	41.10	60.90	65.80
Storage tank	46.60	72.60	70.40	70.10
Tennis court	76.30	37.50	85.20	88.70
Train station	55.10	22.70	66.80	63.50
Vehicle	27.40	47.10	76.70	81.20
Wind mill	65.70	51.20	77.50	80.50