Object detection algorithm for remote sensing images based on geometric adaptation and global perception

doi:10.11772/j.issn.1001-9081.2022010071

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (3): 916-922.DOI: 10.11772/j.issn.1001-9081.2022010071

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles Next Articles

Object detection algorithm for remote sensing images based on geometric adaptation and global perception

Yongxiang GU¹^,², Xin LAN¹^,², Boyi FU¹^,², Xiaolin QIN¹^,²()

^1.Chengdu Institute of Computer Application，Chinese Academy of Sciences，Chengdu Sichuan 610041，China
^2.University of Chinese Academy of Sciences，Beijing 100049，China

Received:2022-01-19 Revised:2022-03-01 Accepted:2022-03-07 Online:2022-03-11 Published:2023-03-10
Contact: Xiaolin QIN
About author:GU Yongxiang， born in 1997， M. S. candidate. His research interests include deep learning， object detection.
LAN Xin， born in 1998， M. S. candidate. Her research interests include deep learning， object detection.
FU Boyi， born in 1998， M. S. candidate. Her research interests include deep learning， object detection.
Supported by:
National Academy of Science Alliance Collaborative Program （Chengdu Branch of Chinese Academy of Sciences - Chongqing Academy of Science and Technology）, Science and Technology Service Network Initiative （STS） Key Regional Program （Type A） of Chinese Academy of Sciences(KFJ-STS-QYZD-2021-21-001);Sichuan Science and Technology Program(2019ZDZX0006);"Western Young Scholars" Project of Chinese Academy of Sciences(201899);Talent Special Project of Organization Department of Sichuan Provincial Party Committee

基于几何适应与全局感知的遥感图像目标检测算法

顾勇翔¹^,², 蓝鑫¹^,², 伏博毅¹^,², 秦小林¹^,²()

^1.中国科学院成都计算机应用研究所，成都 610041
^2.中国科学院大学，北京 100049

通讯作者: 秦小林
作者简介:顾勇翔（1997—），男，江苏苏州人，硕士研究生，CCF会员，主要研究方向：深度学习、目标检测
蓝鑫（1998—），女，福建龙岩人，硕士研究生，CCF会员，主要研究方向：深度学习、目标检测
伏博毅（1998—），女，湖南岳阳人，硕士研究生，CCF会员，主要研究方向：深度学习、目标检测
秦小林（1980—），男，重庆人，研究员，博士，CCF会员，主要研究方向：自动推理、人工智能。
基金资助:
全国科学院联盟合作项目（中国科学院成都分院-重庆科学技术研究院）;中科院STS区域重点项目（A类）(KFJ-STS-QYZD-2021-21-001);四川省科技计划资助项目(2019ZDZX0006);中国科学院“西部青年学者”项目(201899);四川省委组织部人才专项

Abstract

Abstract:

Aiming at the problems such as small object size， arbitrary object direction and complex background of remote sensing images， on the basis of YOLOv5 （You Only Look Once version 5） algorithm， an algorithm involved with geometric adaptation and global perception was proposed. Firstly， deformable convolutions and adaptive spatial attention modules were stacked alternately in series through dense connections. As a result， a Dense Context-Aware Module （DenseCAM） which can model local geometric features was constructed on the basis of taking full advantage of different levels of semantic and location information. Secondly， by introducing Transformer in the end of the backbone network， the global perception ability of the model was enhanced at a low cost and the relationships between objects and scenario content were modeled. On UCAS-AOD and RSOD datasets， compared with YOLOv5s6 algorithm， the proposed algorithm has the mean Average Precision （mAP） increased by 1.8 percentage points and 1.5 percentage points， respectively. Experimental results show that the proposed algorithm can effectively improve the precision of object detection in remote sensing images.

Key words: remote sensing image, object detection, Transformer, deformable convolution, spatial attention, YOLOv5

摘要：

针对遥感图像目标尺寸小、目标方向任意和背景复杂等问题，在YOLOv5算法的基础上，提出一种基于几何适应与全局感知的遥感图像目标检测算法。首先，将可变形卷积与自适应空间注意力模块通过密集连接交替串联堆叠，在充分利用不同层级的语义和位置信息基础上，构建一个能够建模局部几何特征的密集上下文感知模块（DenseCAM）；其次，在骨干网络末端引入Transformer，以较低的开销增强模型的全局感知能力，实现目标与场景内容的关系建模。在UCAS-AOD和RSOD数据集上与YOLOv5s6算法相比，所提算法的平均精度均值（mAP）分别提高1.8与1.5个百分点。实验结果表明，所提算法能够有效提高遥感图像目标检测的精度。

关键词: 遥感图像, 目标检测, Transformer, 可变形卷积, 空间注意力, YOLOv5

CLC Number:

TP751.1

Yongxiang GU, Xin LAN, Boyi FU, Xiaolin QIN. Object detection algorithm for remote sensing images based on geometric adaptation and global perception[J]. Journal of Computer Applications, 2023, 43(3): 916-922.

顾勇翔, 蓝鑫, 伏博毅, 秦小林. 基于几何适应与全局感知的遥感图像目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 916-922.

Figures/Tables 8

References 30

1	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.
2	HU J， SHEN L， SUN G， et al. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
3	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
4	TAN J R， ZHANG G， DENG H M， et al. 1st place solution of LVIS Challenge 2020： a good box is not a guarantee of a good mask［EB/OL］. （2020-09-03）［2022-02-20］..
5	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
6	LIU Z， HU H， LIN Y T， et al. Swin Transformer V2： scaling up capacity and resolution［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 11999-12009. 10.1109/cvpr52688.2022.01170
7	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
8	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
9	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149. 10.1109/tpami.2016.2577031
10	JOCHER G. v5.0 -- YOLO v5-P6 1280 models， AWS， Supervisely and YouTube integrations［EB/OL］（2021-04-12）［2022-02-20］. .
11	WANG C Y， LIAO H Y M， WU Y H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
12	HE K M， ZHANG X Y， REN S Q， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916. 10.1109/tpami.2015.2389824
13	LIU S， QI L， QIN H F， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
14	REZATOFIGHI H， TSOI N， GWAK J， et al. Generalized intersection over union： a metric and a loss for bounding box regression［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 658-666. 10.1109/cvpr.2019.00075
15	ZHANG H Y， CISSE M， DAUPHIN Y N， et al. mixup： Beyond empirical risk minimization［EB/OL］. （2018-04-27）［2022-02-20］..
16	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-04-23）［2022-02-20］..
17	ELFWING S， UCHIBE E， DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning［J］. Neural Networks， 2018， 107： 3-11. 10.1016/j.neunet.2017.12.012
18	高鑫，李慧，张义，等. 基于可变形卷积神经网络的遥感影像密集区域车辆检测方法［J］. 电子与信息学报， 2018， 40（12）：2812-2819. 10.11999/JEIT180209
	GAO X， LI H， ZHANG Y， et al. Vehicle detection in remote sensing images of dense areas based on deformable convolution neural network［J］. Journal of Electronics and Information Technology， 2018， 40（12）： 2812-2819. 10.11999/JEIT180209
19	DAI J F， QI H Z， XIONG Y W， et al. Deformable convolutional networks［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 764-773. 10.1109/iccv.2017.89
20	胡滔. 基于深度特征增强的光学遥感目标检测技术研究［D］. 西安：西安电子科技大学， 2019：24-45.
	HU T. Research on optical remote sensing object detection technology based on deep feature enhancement［D］. Xi’an： Xidian University， 2019：24-45.
21	田婷婷，杨军. 基于多尺度特征融合网络的遥感影像目标检测［J］. 激光与光电子学进展， 2022， 59（16）：427-435.
	TIAN T T， YANG J. Object detection for remote sensing image based on multiscale feature fusion network［J］. Laser and Optoelectronics Progress， 2022， 59（16）：427-435.
22	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
23	XU Y L， ZHU M M， XIN P， et al. Rapid airplane detection in remote sensing images based on multilayer feature fusion in fully convolutional neural networks［J］. Sensors， 2018， 18（7）： No.2335. 10.3390/s18072335
24	汪亚妮，汪西莉. 基于注意力和特征融合的遥感图像目标检测模型［J］. 激光与光电子学进展， 2021， 58（2）：363-371. 10.3788/LOP202158.0228003
	WANG Y N， WANG X L. Remote sensing image target detection model based on attention and feature fusion［J］. Laser and Optoelectronics Progress， 2021， 58（2）： 363-371. 10.3788/LOP202158.0228003
25	ZHU X Z， HU H， LIN S， et al. Deformable ConvNets v2： more deformable， better results［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 9300-9308. 10.1109/cvpr.2019.00953
26	WOO S， PARK J， LEE J Y， et al. CBAM： Convolutional Block Attention Module ［C］// Proceedings of the 2018 European Conference on Computer Vision. Cham： Springer， 2018： 3-19. 10.1007/978-3-030-01234-2_1
27	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
28	ZHU H G， CHEN X G， DAI W Q， et al. Orientation robust object detection in aerial images using deep convolutional neural network［C］// Proceedings of the 2015 IEEE International Conference on Image Processing. Piscataway： IEEE， 2015： 3735-3739. 10.1109/icip.2015.7351502
29	LONG Y， GONG Y P， XIAO Z F， et al. Accurate object localization in remote sensing images based on convolutional neural networks［J］. IEEE Transactions on Geoscience and Remote Sensing， 2017， 55（5）： 2486-2498. 10.1109/tgrs.2016.2645610
30	李婕，周顺，朱鑫潮，等. 结合多通道注意力的遥感图像飞机目标检测［J］. 计算机工程与应用， 2022， 58（1）：209-217. 10.3778/j.issn.1002-8331.2107-0379
	LI J， ZHOU S， ZHU X C， et al. Remote sensing image aircraft target detection combined with multiple channel attention［J］. Computer Engineering and Applications， 2022， 58（1）：209-217. 10.3778/j.issn.1002-8331.2107-0379

数据集	算法	参数量/10⁶	浮点运算量/GFLOPs	类别	P/%	R/%	AP₅₀/%	mAP/%
UCAS-AOD	YOLOv3-SPP	62.6	155.8	汽车	91.3	93.1	92.9	56.5
				飞机	98.6	98.7	99.3	72.9
				平均	95.0	95.9	96.1	64.7
	YOLOv5s6	12.4	16.8	汽车	89.7	92.4	91.8	55.1
				飞机	98.1	98.4	99.3	72.9
				平均	93.9	95.1	95.5	64.0
	本文算法	13.7	17.0	汽车	90.0	92.5	94.2	58.3
				飞机	97.6	98.3	99.3	73.4
				平均	93.8	95.4	96.7	65.8
RSOD	YOLOv3-SPP	62.6	155.8	飞机	89.9	90.1	94.1	61.8
				油罐	95.1	98.1	98.4	77.4
				立交桥	86.6	71.8	78.2	36.5
				操场	82.1	100.0	99.1	85.2
				平均	88.4	90.0	92.4	65.2
	YOLOv5s6	12.4	16.8	飞机	97.3	82.9	93.9	64.3
				油罐	100.0	93.4	98.6	78.9
				立交桥	84.6	61.1	66.7	30.3
				操场	99.5	100.0	99.5	87.0
				平均	95.3	84.4	89.7	65.1
	本文算法	13.7	17.0	飞机	96.1	87.6	94.4	65.5
				油罐	99.5	94.7	97.8	80.1
				立交桥	80.0	66.7	66.9	33.4
				操场	86.5	100.0	99.5	87.3
				平均	90.5	87.2	89.6	66.6

数据集	算法	参数量/10⁶	浮点运算量/GFLOPs	类别	P/%	R/%	AP₅₀/%	mAP/%
UCAS-AOD	YOLOv3-SPP	62.6	155.8	汽车	91.3	93.1	92.9	56.5
				飞机	98.6	98.7	99.3	72.9
				平均	95.0	95.9	96.1	64.7
	YOLOv5s6	12.4	16.8	汽车	89.7	92.4	91.8	55.1
				飞机	98.1	98.4	99.3	72.9
				平均	93.9	95.1	95.5	64.0
	本文算法	13.7	17.0	汽车	90.0	92.5	94.2	58.3
				飞机	97.6	98.3	99.3	73.4
				平均	93.8	95.4	96.7	65.8
RSOD	YOLOv3-SPP	62.6	155.8	飞机	89.9	90.1	94.1	61.8
				油罐	95.1	98.1	98.4	77.4
				立交桥	86.6	71.8	78.2	36.5
				操场	82.1	100.0	99.1	85.2
				平均	88.4	90.0	92.4	65.2
	YOLOv5s6	12.4	16.8	飞机	97.3	82.9	93.9	64.3
				油罐	100.0	93.4	98.6	78.9
				立交桥	84.6	61.1	66.7	30.3
				操场	99.5	100.0	99.5	87.0
				平均	95.3	84.4	89.7	65.1
	本文算法	13.7	17.0	飞机	96.1	87.6	94.4	65.5
				油罐	99.5	94.7	97.8	80.1
				立交桥	80.0	66.7	66.9	33.4
				操场	86.5	100.0	99.5	87.3
				平均	90.5	87.2	89.6	66.6

Transformer	CAM	DenseCAM	P/%	R/%	AP₅₀/%	mAP/%	参数量/10⁶	浮点运算量/GFLOPs
—	—	—	93.9	95.1	95.5	64.0	12.4	16.8
√	—	—	94.8	94.6	95.8	64.6	12.4	16.7
—	√	—	95.1	95.3	96.5	65.0	16.7	17.6
—	—	√	95.3	95.2	96.5	65.1	13.7	17.1
√	—	√	93.8	95.4	96.7	65.8	13.7	17.0

Transformer	CAM	DenseCAM	P/%	R/%	AP₅₀/%	mAP/%	参数量/10⁶	浮点运算量/GFLOPs
—	—	—	93.9	95.1	95.5	64.0	12.4	16.8
√	—	—	94.8	94.6	95.8	64.6	12.4	16.7
—	√	—	95.1	95.3	96.5	65.0	16.7	17.6
—	—	√	95.3	95.2	96.5	65.1	13.7	17.1
√	—	√	93.8	95.4	96.7	65.8	13.7	17.0

[1]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[2]	Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682.
[3]	Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746.
[4]	Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902.
[5]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[6]	Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951.
[7]	Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769.
[8]	Yuwei DING, Hongbo SHI, Jie LI, Min LIANG. Image denoising network based on local and global feature decoupling [J]. Journal of Computer Applications, 2024, 44(8): 2571-2579.
[9]	Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603.
[10]	Chenqian LI, Jun LIU. Ultrasound carotid plaque segmentation method based on semi-supervision and multi-scale cascaded attention [J]. Journal of Computer Applications, 2024, 44(8): 2604-2610.
[11]	Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625.
[12]	Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642.
[13]	Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587.
[14]	Yingjun ZHANG, Niuniu LI, Binhong XIE, Rui ZHANG, Wangdong LU. Semi-supervised object detection framework guided by curriculum learning [J]. Journal of Computer Applications, 2024, 44(8): 2326-2333.
[15]	Yuan TANG, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Relation extraction model based on multi-scale hybrid attention convolutional neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2011-2017.

Object detection algorithm for remote sensing images based on geometric adaptation and global perception

基于几何适应与全局感知的遥感图像目标检测算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 30

Related Articles 15

Recommended Articles

Metrics