Single shot multibox detector recognition method for aerial targets of unmanned aerial vehicle

doi:10.11772/j.issn.1001-9081.2021010026

Abstract

Abstract:

Unmanned Aerial Vehicle （UAV） aerial images have a wide field of vision， and the targets in the images are small and have blurred boundaries. And the existing Single Shot multibox Detector （SSD） target detection model is difficult to accurately detect small targets in aerial images. In order to effectively solve the problem that the original model is easy to have missed detection， based on Feature Pyramid Network （FPN）， a new SSD model based on continuous upsampling was proposed. In the improved SSD model， the input image size was adjusted to $320 × 320$ ， the Conv3_3 feature layer was added， the high-level features were upsampled， and features of the first five layers of VGG16 network were fused by using feature pyramid structure， so as to enhance the semantic representation ability of each feature layer. Meanwhile， the size of anchor box was redesigned. Training and verification were carried out on the open aerial dataset UCAS-AOD. Experimental results show that， the improved SSD model has 94.78% in mean Average Precision （mAP） of different categories， and compared with the existing SSD model， the improved SSD model has the accuracy increased by 17.62%， including 4.66% for plane category and 34.78% for car category.

Key words: aerial image, Convolution Neural Network (CNN), target detection, Single Shot multibox Detector (SSD), feature fusion

摘要：

无人机（UAV）航拍图像视野开阔，图像中的目标较小且边缘模糊，而现有单阶段多框检测器（SSD）目标检测模型难以准确地检测航拍图像中的小目标。为了有效地解决原有模型容易漏检的问题，借鉴特征金字塔网络（FPN）提出了一种基于连续上采样的SSD模型。改进SSD模型将输入图像尺寸调整为 $320 × 320$ ，新增Conv3_3特征层，将高层特征进行上采样，并利用特征金字塔结构对VGG16网络前5层特征进行融合，从而增强各个特征层的语义表达能力，同时重新设计先验框的尺寸。在公开航拍数据集UCAS-AOD上训练并验证，实验结果表明，所提改进SSD模型的各类平均精度均值（mAP）达到了94.78%，与现有SSD模型相比，其准确率提升了17.62%，其中飞机类别提升了4.66%，汽车类别提升了34.78%。

关键词: 航拍图像, 卷积神经网络, 目标检测, 单阶段多框检测器, 特征融合

CLC Number:

TP183

Huaiyu ZHU, Bo LI. Single shot multibox detector recognition method for aerial targets of unmanned aerial vehicle[J]. Journal of Computer Applications, 2021, 41(11): 3234-3241.

朱槐雨, 李博. 单阶段多框检测器无人机航拍目标识别方法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3234-3241.

Figures/Tables 18

References 25

1	HU S， LEE G H. Image-based geo-localization using satellite imagery ［J］. International Journal of Computer Vision， 2020， 128（5）： 1205-1219. 10.1007/s11263-019-01186-0
2	YANG S， CHENG H， LI T， et al. UAV reconnaissance images targeting method ［C］// Proceeding of the 2016 8th International Conference on Digital Image Processing. Bellingham： SPIE， 2016： Article No.100333X. 10.1117/12.2244925
3	WANG B， GU Y. An improved FBPN-based detection network for vehicles in aerial images ［J］. Sensors， 2020， 20（17）： Article No.4709. 10.3390/s20174709
4	XIA Y， YE G X， YAN S S， et al. Application research of fast UAV aerial photography object detection and recognition based on improved YOLOv3 ［J］. Journal of Physics： Conference Series， 2020， 1550： Article No.032075. 10.1088/1742-6596/1550/3/032075
5	QIN Z W， YU F X， LIU C C， et al. How convolutional neural networks see the world — a survey of convolutional neural network visualization methods ［J］. Mathematical Foundations of Computing， 2018， 1（2）： 149-180. 10.3934/mfc.2018008
6	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
7	GIRSHICK R. Fast R-CNN ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
8	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）：1137-1149. 10.1109/tpami.2016.2577031
9	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
10	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS9905. Cham： Springer， 2016： 21-37.
11	LI X L， LI X W， GUAN S J， et al. Trident SSD： a trident single-shot multibox object detector with deconvolution ［J］. Journal of Physics： Conference Series， 2020， 1631： Article No.012182. 10.1088/1742-6596/1631/1/012182
12	CAO J W， SONG C X， SONG S X， et al. Front vehicle detection algorithm for smart car based on improved SSD model ［J］. Sensors， 2020， 20（16）： Article No.4646. 10.3390/s20164646
13	LI Y D， DONG H， LI H G， et al. Multi-block SSD based on small object detection for UAV railway scene surveillance ［J］. Chinese Journal of Aeronautics， 2020， 33（6）： 1747-1755. 10.1016/j.cja.2020.02.024
14	HOU Z Q， LIU X Y， CHEN L L. Object detection algorithm for improving non-Maximum suppression using GIoU ［J］. IOP Conference Series： Materials Science and Engineering， 2020， 790： Article No.012062. 10.1088/1757-899x/790/1/012062
15	ZHU H T， GU C Y. Target detection algorithm introducing attention mechanism： attention_SSD ［J］. International Core Journal of Engineering， 2020， 6（7）： 267-275.
16	LIANG Y J， LI H H， GUO B， et al. Fusion of heterogeneous attention mechanisms in multi-view convolutional neural network for text classification ［J］. Information Sciences， 2021， 548： 295-312. 10.1016/j.ins.2020.10.021
17	姚桐，于雪媛，王越，等.改进SSD无人机航拍小目标识别［J］. 舰船电子工程，2020，40（9）：162-166. 10.3969/j.issn.1672-9730.2020.09.039
	YAO T， YU X H， WANG Y， et al. Improvement of small target recognition algorithm of aerial photography images based on SSD ［J］. Ship Electronic Engineering， 2020， 40（9）： 162-166. 10.3969/j.issn.1672-9730.2020.09.039
18	FU C Y， LIU W， RANGA A， et al. DSSD： deconvolutional single shot detector ［EB/OL］. （2017-01-23）［2020-12-04］. .
19	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［EB/OL］. （2015-04-10）［2020-12-06］. . 10.5244/c.28.6
20	HE K M， ZHANG X Y， REN S Q， et al. Deep residual leaning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
21	REDMON J， FARHADI A. YOLOv3： an incremental improvement ［EB/OL］. （2018-04-08）［2020-12-10］.. 10.1109/cvpr.2018.00430
22	赵爽，黄怀玉，胡一鸣，等.基于深度学习的无人机航拍车辆检测［J］.计算机应用，2019，39（S2）：91-96.
	ZHAO S， HUANG H Y， HU Y M， et al. Vehicle detection in satellite imagery based on deep learning ［J］. Journal of Computer Applications， 2019， 39（S2）： 91-96.
23	LI M， ZHANG Z J， LEI L P， et al. Agricultural greenhouses detection in high-resolution satellite images based on convolutional neural networks： comparison of faster R-CNN， YOLO v3 and SSD［J］. Sensors， 2020， 20（17）： Article No.4938. 10.3390/s20174938
24	刘英杰，杨风暴，胡鹏.基于Cascade R-CNN的并行特征金字塔网络无人机航拍图像目标检测算法［J］.激光与光电子学进展，2020，57（20）：302-309. 10.3788/lop57.201505
	LIU Y J， YANG F B， HU P. Parallel FPN algorithm based on Cascade R-CNN for object detection from UAV aerial lmages ［J］. Laser & Optoelectronics Progress， 2020， 57（20）： 302-309. 10.3788/lop57.201505
25	ZHOU B L， KHOSLA A， LAPEDRIZA A， et al. Learning deep features for discriminative localization ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2921-2929. 10.1109/cvpr.2016.319

特征层	Conv3_3		Conv4_3		fc7		Conv6_2		Conv7_2		Conv8_2		Conv9_2
特征层	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size
SSD	—	—	30	60	60	111	111	162	152	213	213	264	264	315
CU-SSD	16	32	32	64	64	118	118	173	162	227	227	282	282	336

特征层	Conv3_3		Conv4_3		fc7		Conv6_2		Conv7_2		Conv8_2		Conv9_2
特征层	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size
SSD	—	—	30	60	60	111	111	162	152	213	213	264	264	315
CU-SSD	16	32	32	64	64	118	118	173	162	227	227	282	282	336

样本	图像数	样本数
总数	1 510	14 596
飞机	1 000	7 482
汽车	510	7 114

样本	图像数	样本数
总数	1 510	14 596
飞机	1 000	7 482
汽车	510	7 114

模型	AP/%		mAP/%	帧率/（frame·s^-1）	Size/MB
模型	car	plane	mAP/%	帧率/（frame·s^-1）	Size/MB
SSD	69.33	91.84	80.58	13.6	91
FSSD	62.04	90.24	76.14	13.0	105
RFBNet	73.80	93.88	83.84	9.9	142
YOLOv3	86.07	94.06	90.07	12.0	235
CU-SSD	93.44	96.12	94.78	9.0	79