单阶段多框检测器无人机航拍目标识别方法

doi:10.11772/j.issn.1001-9081.2021010026

《计算机应用》唯一官方网站 ›› 2021, Vol. 41 ›› Issue (11): 3234-3241.DOI: 10.11772/j.issn.1001-9081.2021010026

单阶段多框检测器无人机航拍目标识别方法

朱槐雨¹, 李博²()

^1.电子科技大学机械与电气工程学院，成都 611731
^2.电子科技大学中山学院机电工程学院，广东中山 528400

收稿日期:2021-01-07 修回日期:2021-02-03 接受日期:2021-03-23 发布日期:2021-04-15 出版日期:2021-11-10
通讯作者: 李博
作者简介:朱槐雨（1995—），男，四川自贡人，硕士研究生，主要研究方向：机器视觉、人工智能
李博（1977—），男，广东茂名人，副教授，硕士，主要研究方向：机器视觉检测、工业自动化。

Single shot multibox detector recognition method for aerial targets of unmanned aerial vehicle

Huaiyu ZHU¹, Bo LI²()

^1.School of Mechanical and Electrical Engineering，University of Electronic Science and Technology of China，Chengdu Sichuan 611731，China
^2.College of Mechanical and Electrical Engineering，University of Electronic Science and Technology of China，Zhongshan Institute，Zhongshan Guangdong 528400，China

Received:2021-01-07 Revised:2021-02-03 Accepted:2021-03-23 Online:2021-04-15 Published:2021-11-10
Contact: Bo LI
About author:ZHU Huaiyu， born in 1995， M. S. candidate. His research interests include machine vision，artificial intelligence
LI Bo，born in 1977，Ph. D.，associate professor. His research interests include machine vision inspection，industrial automation.

摘要/Abstract

摘要：

无人机（UAV）航拍图像视野开阔，图像中的目标较小且边缘模糊，而现有单阶段多框检测器（SSD）目标检测模型难以准确地检测航拍图像中的小目标。为了有效地解决原有模型容易漏检的问题，借鉴特征金字塔网络（FPN）提出了一种基于连续上采样的SSD模型。改进SSD模型将输入图像尺寸调整为 $320 × 320$ ，新增Conv3_3特征层，将高层特征进行上采样，并利用特征金字塔结构对VGG16网络前5层特征进行融合，从而增强各个特征层的语义表达能力，同时重新设计先验框的尺寸。在公开航拍数据集UCAS-AOD上训练并验证，实验结果表明，所提改进SSD模型的各类平均精度均值（mAP）达到了94.78%，与现有SSD模型相比，其准确率提升了17.62%，其中飞机类别提升了4.66%，汽车类别提升了34.78%。

关键词: 航拍图像, 卷积神经网络, 目标检测, 单阶段多框检测器, 特征融合

Abstract:

Unmanned Aerial Vehicle （UAV） aerial images have a wide field of vision， and the targets in the images are small and have blurred boundaries. And the existing Single Shot multibox Detector （SSD） target detection model is difficult to accurately detect small targets in aerial images. In order to effectively solve the problem that the original model is easy to have missed detection， based on Feature Pyramid Network （FPN）， a new SSD model based on continuous upsampling was proposed. In the improved SSD model， the input image size was adjusted to $320 × 320$ ， the Conv3_3 feature layer was added， the high-level features were upsampled， and features of the first five layers of VGG16 network were fused by using feature pyramid structure， so as to enhance the semantic representation ability of each feature layer. Meanwhile， the size of anchor box was redesigned. Training and verification were carried out on the open aerial dataset UCAS-AOD. Experimental results show that， the improved SSD model has 94.78% in mean Average Precision （mAP） of different categories， and compared with the existing SSD model， the improved SSD model has the accuracy increased by 17.62%， including 4.66% for plane category and 34.78% for car category.

Key words: aerial image, Convolution Neural Network (CNN), target detection, Single Shot multibox Detector (SSD), feature fusion

中图分类号:

TP183

朱槐雨, 李博. 单阶段多框检测器无人机航拍目标识别方法[J]. 计算机应用, 2021, 41(11): 3234-3241.

Huaiyu ZHU, Bo LI. Single shot multibox detector recognition method for aerial targets of unmanned aerial vehicle[J]. Journal of Computer Applications, 2021, 41(11): 3234-3241.

图/表 18

表1 不同目标检测模型在PASCAL VOC2007数据集上的mAP与帧率对比

Tab. 1 Comparison of mAP and frame rate of different target detection models on PASCAL VOC2007 dataset

目标检测模型	mAP/%	帧率/（frame·s^-1）
R-CNN	66	0.02
Fast R-CNN	70	0.4
Faster R-CNN	73	7
YOLO	66	21
SSD300	77	46
SSD512	80	19

图1 SSD模型结构

Fig. 1 SSD model structure

图2 交并比计算

Fig. 2 IoU calculation

图3 CU-SSD模型结构

Fig. 3 CU-SSD model structure

图4 特征层上采样结果

Fig. 4 Upsampling results on feature layer

图5 特征融合方式

Fig. 5 Feature fusion methods

图6 特征融合模块

Fig. 6 Feature fusion module

图7 Conv3_3层热力图

Fig. 7 Heat map of Conv3_3 layer

表2 先验框尺寸

Tab. 2 Size of anchor box

特征层	Conv3_3		Conv4_3		fc7		Conv6_2		Conv7_2		Conv8_2		Conv9_2
特征层	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size	Min_size	Max_size
SSD	—	—	30	60	60	111	111	162	152	213	213	264	264	315
CU-SSD	16	32	32	64	64	118	118	173	162	227	227	282	282	336

图8 不同特征层先验框比较

Fig. 8 Comparison of anchor boxes of different feature layers

表3 数据集组成

Tab. 3 Dataset composition

样本	图像数	样本数
总数	1 510	14 596
飞机	1 000	7 482
汽车	510	7 114

图9 训练过程中的loss曲线

Fig. 9 Loss curves during training process

表4 不同模型的性能结果

Tab. 4 Performance results of different models

模型	AP/%		mAP/%	帧率/（frame·s^-1）	Size/MB
模型	car	plane	mAP/%	帧率/（frame·s^-1）	Size/MB
SSD	69.33	91.84	80.58	13.6	91
FSSD	62.04	90.24	76.14	13.0	105
RFBNet	73.80	93.88	83.84	9.9	142
YOLOv3	86.07	94.06	90.07	12.0	235
CU-SSD	93.44	96.12	94.78	9.0	79

表5 不同特征融合层数的实验结果

Tab. 5 Experimental results of different feature fusion layers

融合层数	AP/%		mAP/%
融合层数	car	plane	mAP/%
0	89.49	94.34	91.92
2	90.04	94.84	92.44
3	90.85	95.06	92.96
4	92.44	96.24	94.34
5	93.44	96.12	94.78
6	93.12	95.90	94.51

图10 car和plane类别的P-R曲线

Fig. 10 P-R curves of car and plane categories

图11 SSD和CU-SSD对车辆类别的检测效果

Fig. 11 Detection effects of SSD and CU-SSD on car category

图12 SSD和CU-SSD对飞机类别的检测效果

Fig. 12 Detection effects of SSD and CU-SSD on plane category

表6 不同改进模块的性能对比

Tab. 6 Performance comparison of different improved modules

组别	改进模块			AP/%		mAP/%	帧率/（frame·s^-1）
组别	Conv3_3	Fusion	anchors	car	plane	mAP/%	帧率/（frame·s^-1）
1	×	×	×	69.33	91.84	80.58	13.6
2	×	√	×	63.78	90.51	77.15	12.2
3	×	×	√	85.18	94.15	89.66	9.6
4	×	√	√	89.19	94.28	91.74	9.6
5	√	×	√	89.49	94.34	91.92	9.2
6	√	√	√	93.44	96.12	94.78	9.0

参考文献 25

1	HU S， LEE G H. Image-based geo-localization using satellite imagery ［J］. International Journal of Computer Vision， 2020， 128（5）： 1205-1219. 10.1007/s11263-019-01186-0
2	YANG S， CHENG H， LI T， et al. UAV reconnaissance images targeting method ［C］// Proceeding of the 2016 8th International Conference on Digital Image Processing. Bellingham： SPIE， 2016： Article No.100333X. 10.1117/12.2244925
3	WANG B， GU Y. An improved FBPN-based detection network for vehicles in aerial images ［J］. Sensors， 2020， 20（17）： Article No.4709. 10.3390/s20174709
4	XIA Y， YE G X， YAN S S， et al. Application research of fast UAV aerial photography object detection and recognition based on improved YOLOv3 ［J］. Journal of Physics： Conference Series， 2020， 1550： Article No.032075. 10.1088/1742-6596/1550/3/032075
5	QIN Z W， YU F X， LIU C C， et al. How convolutional neural networks see the world — a survey of convolutional neural network visualization methods ［J］. Mathematical Foundations of Computing， 2018， 1（2）： 149-180. 10.3934/mfc.2018008
6	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
7	GIRSHICK R. Fast R-CNN ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
8	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）：1137-1149. 10.1109/tpami.2016.2577031
9	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
10	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS9905. Cham： Springer， 2016： 21-37.
11	LI X L， LI X W， GUAN S J， et al. Trident SSD： a trident single-shot multibox object detector with deconvolution ［J］. Journal of Physics： Conference Series， 2020， 1631： Article No.012182. 10.1088/1742-6596/1631/1/012182
12	CAO J W， SONG C X， SONG S X， et al. Front vehicle detection algorithm for smart car based on improved SSD model ［J］. Sensors， 2020， 20（16）： Article No.4646. 10.3390/s20164646
13	LI Y D， DONG H， LI H G， et al. Multi-block SSD based on small object detection for UAV railway scene surveillance ［J］. Chinese Journal of Aeronautics， 2020， 33（6）： 1747-1755. 10.1016/j.cja.2020.02.024
14	HOU Z Q， LIU X Y， CHEN L L. Object detection algorithm for improving non-Maximum suppression using GIoU ［J］. IOP Conference Series： Materials Science and Engineering， 2020， 790： Article No.012062. 10.1088/1757-899x/790/1/012062
15	ZHU H T， GU C Y. Target detection algorithm introducing attention mechanism： attention_SSD ［J］. International Core Journal of Engineering， 2020， 6（7）： 267-275.
16	LIANG Y J， LI H H， GUO B， et al. Fusion of heterogeneous attention mechanisms in multi-view convolutional neural network for text classification ［J］. Information Sciences， 2021， 548： 295-312. 10.1016/j.ins.2020.10.021
17	姚桐，于雪媛，王越，等.改进SSD无人机航拍小目标识别［J］. 舰船电子工程，2020，40（9）：162-166. 10.3969/j.issn.1672-9730.2020.09.039
	YAO T， YU X H， WANG Y， et al. Improvement of small target recognition algorithm of aerial photography images based on SSD ［J］. Ship Electronic Engineering， 2020， 40（9）： 162-166. 10.3969/j.issn.1672-9730.2020.09.039
18	FU C Y， LIU W， RANGA A， et al. DSSD： deconvolutional single shot detector ［EB/OL］. （2017-01-23）［2020-12-04］. .
19	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［EB/OL］. （2015-04-10）［2020-12-06］. . 10.5244/c.28.6
20	HE K M， ZHANG X Y， REN S Q， et al. Deep residual leaning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
21	REDMON J， FARHADI A. YOLOv3： an incremental improvement ［EB/OL］. （2018-04-08）［2020-12-10］.. 10.1109/cvpr.2018.00430
22	赵爽，黄怀玉，胡一鸣，等.基于深度学习的无人机航拍车辆检测［J］.计算机应用，2019，39（S2）：91-96.
	ZHAO S， HUANG H Y， HU Y M， et al. Vehicle detection in satellite imagery based on deep learning ［J］. Journal of Computer Applications， 2019， 39（S2）： 91-96.
23	LI M， ZHANG Z J， LEI L P， et al. Agricultural greenhouses detection in high-resolution satellite images based on convolutional neural networks： comparison of faster R-CNN， YOLO v3 and SSD［J］. Sensors， 2020， 20（17）： Article No.4938. 10.3390/s20174938
24	刘英杰，杨风暴，胡鹏.基于Cascade R-CNN的并行特征金字塔网络无人机航拍图像目标检测算法［J］.激光与光电子学进展，2020，57（20）：302-309. 10.3788/lop57.201505
	LIU Y J， YANG F B， HU P. Parallel FPN algorithm based on Cascade R-CNN for object detection from UAV aerial lmages ［J］. Laser & Optoelectronics Progress， 2020， 57（20）： 302-309. 10.3788/lop57.201505
25	ZHOU B L， KHOSLA A， LAPEDRIZA A， et al. Learning deep features for discriminative localization ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2921-2929. 10.1109/cvpr.2016.319

[1]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[2]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[3]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[4]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[5]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[6]	牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.
[7]	马佳良, 陈斌, 孙晓飞. 基于改进的Faster R-CNN的通用目标检测框架[J]. 计算机应用, 2021, 41(9): 2712-2719.
[8]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[9]	周险兵, 樊小超, 任鸽, 杨勇. 基于多层次语义特征的英文作文自动评分方法[J]. 计算机应用, 2021, 41(8): 2205-2211.
[10]	樊玮, 李晨炫, 邢艳, 黄睿, 彭洪健. 航空发动机损伤图像的二分类到多分类递进式检测网络[J]. 计算机应用, 2021, 41(8): 2352-2357.
[11]	陈静, 毛莺池, 陈豪, 王龙宝, 王子成. 基于改进单点多盒检测器的大坝缺陷目标检测方法[J]. 计算机应用, 2021, 41(8): 2366-2372.
[12]	王伟, 赵尔平, 崔志远, 孙浩. 基于HowNet义原和Word2vec词向量表示的多特征融合消歧方法[J]. 计算机应用, 2021, 41(8): 2193-2198.
[13]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[14]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[15]	黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.

单阶段多框检测器无人机航拍目标识别方法

Single shot multibox detector recognition method for aerial targets of unmanned aerial vehicle

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 25

相关文章 15

编辑推荐

Metrics