Object detection algorithm based on attention mechanism and context information

doi:10.11772/j.issn.1001-9081.2022040554

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (5): 1557-1564.DOI: 10.11772/j.issn.1001-9081.2022040554

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles Next Articles

Object detection algorithm based on attention mechanism and context information

Hui LIU¹^,², Linyu ZHANG¹^,²(), Fugang WANG¹^,², Rujin HE¹^,²

^1.School of Communication and Information Engineering，Chongqing University of Posts and Telecommunications，Chongqing 400065，China
^2.Digital Intelligence Communication New Technology Application Research Center，Chongqing University of Posts and Telecommunications，Chongqing 400065，China

Received:2022-04-19 Revised:2022-06-20 Accepted:2022-06-22 Online:2022-07-11 Published:2023-05-10
Contact: Linyu ZHANG
About author:LIU Hui， born in 1966， M. S.， senior engineer. His research interests include computer vision， new technology of communication network， telecommunication system service.
ZHANG Linyu， born in 1997， M. S. candidate. Her research interests include object detection.
WANG Fugang， born in 1997， M. S. candidate. His research interests include object detection.
HE Rujin， born in 1998， M. S. candidate. Her research interests include abnormal behavior detection.

基于注意力机制和上下文信息的目标检测算法

刘辉¹^,², 张琳玉¹^,²(), 王复港¹^,², 何如瑾¹^,²

^1.重庆邮电大学通信与信息工程学院，重庆 400065
^2.重庆邮电大学数智化通信新技术应用研究中心，重庆 400065

通讯作者: 张琳玉
作者简介:刘辉（1966—），男，四川仪陇人，高级工程师，硕士，主要研究方向：计算机视觉、通信网络新技术、电信系统业务
张琳玉（1997—），女，河北石家庄人，硕士研究生，主要研究方向：目标检测 1075634172@qq.com
王复港（1997—），男，山东泰安人，硕士研究生，主要研究方向：目标检测
何如瑾（1998—），女，湖南邵阳人，硕士研究生，主要研究方向：异常行为识别。

Abstract

Abstract:

Aiming at the problem of small object miss detection in object detection process， an improved YOLOv5 （You Only Look Once） object detection algorithm based on attention mechanism and multi-scale context information was proposed. Firstly， Multiscale Dilated Separable Convolutional Module （MDSCM） was added to the feature extraction structure to extract multi-scale feature information， increasing the receptive field while avoiding the loss of small object information. Secondly， the attention mechanism was added to the backbone network， and the location awareness information was embedded in the channel information， so as to further enhance the feature expression ability of the algorithm. Finally， Soft-NMS （Soft-Non-Maximum Suppression） was used instead of the NMS （Non-Maximum Suppression） used by YOLOv5 to reduce the missed detection rate of the algorithm. Experimental results show that the improved algorithm achieves detection precisions of 82.80%， 71.74% and 77.11% respectively on PASCAL VOC dataset， DOTA aerial image dataset and DIOR optical remote sensing dataset， which are 3.70， 1.49 and 2.48 percentage points higer than those of YOLOv5， and it has better detection effect on small objects. Therefore， the improved YOLOv5 can be better applied to small object detection scenarios in practice.

Key words: object detection, depthwise separable convolution, dilated convolution, attention mechanism, Non-Maximum Suppression (NMS)

摘要：

针对目标检测过程中存在的小目标漏检问题，提出一种基于注意力机制和多尺度上下文信息的改进YOLOv5目标检测算法。首先，在特征提取结构中加入多尺度空洞可分离卷积模块（MDSCM）以提取多尺度特征信息，在增大感受野的同时避免小目标信息的丢失；其次，在主干网络中添加注意力机制，并在通道信息中嵌入位置感知信息，进一步增强算法的特征表达能力；最后，使用Soft-NMS（Soft-Non-Maximum Suppression）代替YOLOv5使用的非极大值抑制（NMS），降低检测算法的漏检率。实验结果表明，改进算法在PASCAL VOC数据集、DOTA航拍数据集和DIOR光学遥感数据集上的检测精度分别达到了82.80%、71.74%和77.11%，相较于YOLOv5，分别提高了3.70、1.49和2.48个百分点；而且它对图像中小目标的检测效果更好。因此，改进的YOLOv5可以更好地应用到小目标检测场景中。

关键词: 目标检测, 深度可分离卷积, 空洞卷积, 注意力机制, 非极大值抑制

CLC Number:

TP391.41

Hui LIU, Linyu ZHANG, Fugang WANG, Rujin HE. Object detection algorithm based on attention mechanism and context information[J]. Journal of Computer Applications, 2023, 43(5): 1557-1564.

刘辉, 张琳玉, 王复港, 何如瑾. 基于注意力机制和上下文信息的目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1557-1564.

Figures/Tables 13

References 32

1	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceeding of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014：580-587. 10.1109/cvpr.2014.81
2	GIRSHICK R. Fast R-CNN［C］// Proceeding of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：1440-1448. 10.1109/iccv.2015.169
3	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
4	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
5	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2022-04-18］.. 10.1109/cvpr.2017.690
6	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multiBox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
7	ZHANG S， WEN L， BIAN X， et al. Single-shot refinement neural network for object detection［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4203-4212. 10.1109/cvpr.2018.00442
8	LIM J S， ASTRID M， YOON H J， et al. Small object detection using context and attention［C］// Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication. Piscataway： IEEE， 2021： 181-186. 10.1109/icaiic51459.2021.9415217
9	许腾，唐贵进，刘清萍，等. 基于空洞卷积和Focal Loss的改进YOLOv3算法［J］. 南京邮电大学学报（自然科学版）， 2020， 40（6）：100-108. 10.14132/j.cnki.1673-5439.2020.06.015
	XU T， TANG G J， LIU Q P， et al. Improved YOLOv3 based on dilated convolution and Focal Loss［J］. Journal of Nanjing University of Posts and Telecommunications （Natural Science Edition）， 2020， 40（6）：100-108. 10.14132/j.cnki.1673-5439.2020.06.015
10	CAO G M， XIE X M， YANG W Z， et al. Feature-fused SSD： fast detection for small objects［C］// Proceedings of the 9th International Conference on Graphic and Image Processing. Bellingham， WA： SPIE， 2018： No.106151E. 10.1117/12.2304811
11	董如婵，焦李成，赵进，等. 一种深度融合机制的遥感图像目标检测技术［J］. 西安电子科技大学学报， 2021， 48（5）： 128-138.
	DONG R C， JIAO L C， ZHAO J， et al. Application of the deep fusion mechanism in object detection of remote sensing images［J］. Journal of Xidian University， 2021， 48（5）：128-138.
12	马巧梅，王明俊，梁昊然. 复杂场景下基于改进YOLOv3的车牌定位检测算法［J］. 计算机工程与应用， 2021， 57（7）：198-208. 10.3778/j.issn.1002-8331.2008-0137
	MA Q M， WANG M J， LIANG H R. License plate location detection algorithm based on improved YOLOv3 in complex scenes［J］. Computer Engineering and Applications， 2021， 57（7）：198-208. 10.3778/j.issn.1002-8331.2008-0137
13	ZHAO H S， SHI J P， QI X J， et al. Pyramid scene parsing network［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6230-6239. 10.1109/cvpr.2017.660
14	ZHANG F， JIAO L C， LI L L， et al. Multiresolution attention extractor for small object detection［EB/OL］. （2020-06-10）［2022-04-18］.. 10.48550/arXiv.2006.05941
15	Ultralytics. YOLOv5［EB/OL］. ［2022-01-18］.. 10.1117/1.jei.31.3.033033
16	XU B Q， JIANG G W， LIU J H， et al. Aircraft rotated boxes detection method based on YOLOv5［C］// Proceedings of the 4th International Conference on Pattern Recognition and Artificial Intelligence. Piscataway： IEEE， 2021： 390-394. 10.1109/prai53619.2021.9551072
17	WANG C Y， LIAO H Y M， WU Y H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
18	HE K M， ZHANG X Y， REN S Q， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916. 10.1109/tpami.2015.2389824
19	LIN T Y， DOLLÁR P， GIRSHISICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
20	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
21	CHEN Q， WANG Y M， YANG T， et al. You only look one-level feature［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13034-13043. 10.1109/cvpr46437.2021.01284
22	CHEN L C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation［EB/OL］. （2017-12-05）［2022-04-06］.. 10.1007/978-3-030-01234-2_49
23	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
24	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
25	HOU Q， ZHOU D， FENG J. Coordinate attention for efficient mobile network design［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13713-13722. 10.1109/cvpr46437.2021.01350
26	JIANG B R， LUO R X， MAO J Y， et al. Acquisition of localization confidence for accurate object detection［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11218. Cham： Springer， 2018： 816-832.
27	NING C C， ZHOU H J， SONG Y， et al. Inception single shot multiBox detector for object detection［C］// Proceedings of the 2017 IEEE International Conference on Multimedia and Expo Workshops. Piscataway： IEEE， 2017： 549-554. 10.1109/icmew.2017.8026312
28	BODLA N， SINGH B， CHELLAPPA R， et al. Soft-NMS - improving object detection with one line of code［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 5562-5570. 10.1109/iccv.2017.593
29	WANG P Q， CHEN P F， YUAN Y， et al. Understanding convolution for semantic segmentation［C］// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2018： 1451-1460. 10.1109/wacv.2018.00163
30	EVERINGHAM M， ESLAMI S M A， van GOOL L， et al. The PASCAL Visual Object Classes challenge： a retrospective［J］. International Journal of Computer Vision， 2015， 111（1）：98-136. 10.1007/s11263-014-0733-5
31	XIA G S， BAI X， DING J， et al. DOTA： a large-scale dataset for object detection in aerial images［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 3974-3983. 10.1109/cvpr.2018.00418
32	LI K， WAN G， CHENG G， et al. Object detection in optical remote sensing images： a survey and a new benchmark［J］. ISPRS Journal of Photogrammetry and Remote Sensing， 2020， 159： 296-307. 10.1016/j.isprsjprs.2019.11.023

配置项	训练	测试
编程语言	Python	Python
深度学习框架	Pytorch1.8.0	Pytorch1.8.0
操作系统	Windows 10	Windows 10
CPU	Core i9-10980XE	Core i5-11400F
内存	128 GB	16 GB
GPU	Nvidia RTX 3080	Nvidia RTX3060
CUDA	11.1	11.1

配置项	训练	测试
编程语言	Python	Python
深度学习框架	Pytorch1.8.0	Pytorch1.8.0
操作系统	Windows 10	Windows 10
CPU	Core i9-10980XE	Core i5-11400F
内存	128 GB	16 GB
GPU	Nvidia RTX 3080	Nvidia RTX3060
CUDA	11.1	11.1

算法	mAP/%	FPS
YOLOv5	79.10	108
YOLOv5+MDSCM	80.00	91
YOLOv5+CA	81.10	108
YOLOv5+GCA	81.40	108
YOLOv5+Soft-NMS	79.60	104
YOLOv5+MDSCM+GCA	81.70	91
YOLOv5+MDSCM+Soft-NMS	80.90	90
YOLOv5+GCA+Soft-NMS	82.00	106
AC-YOLO	82.80	90

算法	mAP/%	FPS
YOLOv5	79.10	108
YOLOv5+MDSCM	80.00	91
YOLOv5+CA	81.10	108
YOLOv5+GCA	81.40	108
YOLOv5+Soft-NMS	79.60	104
YOLOv5+MDSCM+GCA	81.70	91
YOLOv5+MDSCM+Soft-NMS	80.90	90
YOLOv5+GCA+Soft-NMS	82.00	106
AC-YOLO	82.80	90

网络	尺寸大小	mAP/%	FPS
Faster RCNN	640×640	73.32	5
SSD	640×640	77.66	54
YOLOv3	640×640	72.34	60
Tiny-YOLOv3	640×640	73.28	91
YOLOv5	640×640	79.10	108
AC-YOLO	640×640	82.80	90

Object detection algorithm based on attention mechanism and context information

基于注意力机制和上下文信息的目标检测算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 32

Related Articles 15

Recommended Articles

Metrics

类别	AP（IoU=0.5）
类别	YOLOv3	SSD	YOLOv5	本文算法
Aero	81.20	75.50	87.70	89.20
Bike	80.30	80.20	89.30	91.00
Bird	74.00	72.30	74.30	80.80
Boat	65.50	66.30	70.80	73.90
Bottle	64.10	47.60	71.60	71.80
Bus	81.50	83.00	85.50	89.60
Car	82.20	84.20	91.70	92.00
Cat	83.10	86.10	83.20	89.70
Chair	61.23	54.70	61.90	67.70
Cow	77.30	78.30	82.00	85.90
Table	75.20	73.90	73.80	77.50
Dog	82.20	84.50	81.00	88.00
Horse	84.69	85.30	87.90	91.40
Mbike	81.29	82.60	86.60	89.10
Person	78.46	76.20	86.60	88.50
Plant	52.18	48.60	52.40	57.80
Sheep	77.52	73.90	81.70	84.70
Sofa	74.41	76.00	70.80	78.20
Train	81.66	83.40	83.50	87.30
TV	71.99	74.00	79.80	82.00

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[4]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[5]	Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587.
[6]	Yingjun ZHANG, Niuniu LI, Binhong XIE, Rui ZHANG, Wangdong LU. Semi-supervised object detection framework guided by curriculum learning [J]. Journal of Computer Applications, 2024, 44(8): 2326-2333.
[7]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[8]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[9]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[10]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[11]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[12]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[13]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[14]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[15]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.

类别	AP（IoU=0.5）
类别	YOLOv3	SSD	YOLOv5	本文算法
mAP	67.97	41.98	70.25	71.74
Small- vehicle	66.80	10.05	66.30	67.80
Large-vehicle	81.70	50.20	83.60	85.90
Plane	86.20	64.70	90.80	91.80
Storage-tank	69.90	57.90	69.70	74.90
Ship	84.30	31.30	86.80	88.10
Harbor	80.50	80.50	84.00	82.20
Ground track-field	56.70	24.90	61.90	59.30
Soccer ball field	51.70	22.70	52.50	55.60
Tennis-court	89.40	85.50	94.00	93.40
Swimming pool	60.30	18.50	62.90	64.10
Baseball diamond	73.60	38.20	76.90	74.00
Roundabout	50.30	44.50	58.00	59.40
Basketball court	60.40	62.50	64.40	66.20
Bridge	46，10	26.20	47.80	50.40
Helicopter	51.80	12.10	54.20	63.00

类别	AP（IoU=0.5）
类别	YOLOv3	SSD	YOLOv5	本文算法
mAP	67.97	41.98	70.25	71.74
Small- vehicle	66.80	10.05	66.30	67.80
Large-vehicle	81.70	50.20	83.60	85.90
Plane	86.20	64.70	90.80	91.80
Storage-tank	69.90	57.90	69.70	74.90
Ship	84.30	31.30	86.80	88.10
Harbor	80.50	80.50	84.00	82.20
Ground track-field	56.70	24.90	61.90	59.30
Soccer ball field	51.70	22.70	52.50	55.60
Tennis-court	89.40	85.50	94.00	93.40
Swimming pool	60.30	18.50	62.90	64.10
Baseball diamond	73.60	38.20	76.90	74.00
Roundabout	50.30	44.50	58.00	59.40
Basketball court	60.40	62.50	64.40	66.20
Bridge	46，10	26.20	47.80	50.40
Helicopter	51.80	12.10	54.20	63.00

类别	AP（IoU=0.5）
类别	YOLOv3	SSD	YOLOv5	本文算法
mAP	58.63	51.58	74.63	77.11
Airplane	59.60	49.40	89.10	93.10
Airport	72.70	63.10	78.10	80.90
Baseball field	73.40	66.60	81.90	79.90
Basketball court	75.70	71.10	80.00	84.40
Bridge	29.70	26.50	69.20	76.00
Chimney	65.60	63.30	89.70	81.70
Dam	56.60	54.30	73.10	77.10
Expressway service area	63.50	62.70	70.50	67.60
Expressway toll station	53.10	46.60	58.50	70.00
Golf course	65.30	64.40	70.30	66.70
Ground track field	68.60	53.10	66.20	75.70
Harbor	49.40	44.20	69.30	75.50
Overpass	48.10	35.70	78.80	76.70
Ship	59.20	58.30	80.30	87.00
Stadium	61.00	41.10	60.90	65.80
Storage tank	46.60	72.60	70.40	70.10
Tennis court	76.30	37.50	85.20	88.70
Train station	55.10	22.70	66.80	63.50
Vehicle	27.40	47.10	76.70	81.20
Wind mill	65.70	51.20	77.50	80.50

类别	AP（IoU=0.5）
类别	YOLOv3	SSD	YOLOv5	本文算法
mAP	58.63	51.58	74.63	77.11
Airplane	59.60	49.40	89.10	93.10
Airport	72.70	63.10	78.10	80.90
Baseball field	73.40	66.60	81.90	79.90
Basketball court	75.70	71.10	80.00	84.40
Bridge	29.70	26.50	69.20	76.00
Chimney	65.60	63.30	89.70	81.70
Dam	56.60	54.30	73.10	77.10
Expressway service area	63.50	62.70	70.50	67.60
Expressway toll station	53.10	46.60	58.50	70.00
Golf course	65.30	64.40	70.30	66.70
Ground track field	68.60	53.10	66.20	75.70
Harbor	49.40	44.20	69.30	75.50
Overpass	48.10	35.70	78.80	76.70
Ship	59.20	58.30	80.30	87.00
Stadium	61.00	41.10	60.90	65.80
Storage tank	46.60	72.60	70.40	70.10
Tennis court	76.30	37.50	85.20	88.70
Train station	55.10	22.70	66.80	63.50
Vehicle	27.40	47.10	76.70	81.20
Wind mill	65.70	51.20	77.50	80.50