Object detection in remote sensing image based on multi-scale feature fusion and weighted boxes fusion

doi:10.11772/j.issn.1001-9081.2024020252

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (2): 633-639.DOI: 10.11772/j.issn.1001-9081.2024020252

• Multimedia computing and computer simulation • Previous Articles

Object detection in remote sensing image based on multi-scale feature fusion and weighted boxes fusion

Zhongwei ZHANG¹, Jun WANG¹, Shudong LIU¹(), Zhiheng WANG²

^1.School of Computer and Information Engineering，Tianjin Chengjian University，Tianjin 300384，China
^2.School of Geology and Geomatics，Tianjin Chengjian University，Tianjin 300384，China

Received:2024-03-11 Revised:2024-04-23 Accepted:2024-04-25 Online:2024-06-04 Published:2025-02-10
Contact: Shudong LIU
About author:ZHANG Zhongwei， born in 1986， Ph. D.， lecturer. Her research interests include deep learning， image processing， pattern recognition.
WANG Jun， born in 1996， M. S. candidate. Her research interests include image processing.
WANG Zhiheng， born in 1983， Ph. D.， associate professor. His research interests include geographic information modeling and its application in disaster prevention and reduction.
Supported by:
National Natural Science Foundation of China(41971310)

多尺度特征融合与加权框融合的遥感图像目标检测

张众维¹, 王俊¹, 刘树东¹(), 王志恒²

^1.天津城建大学计算机与信息工程学院，天津 300384
^2.天津城建大学地质与测绘学院，天津 300384

通讯作者: 刘树东
作者简介:张众维（1986—），女，黑龙江齐齐哈尔人，讲师，博士，主要研究方向：深度学习、图像处理、模式识别
王俊（1996—），女，山东青岛人，硕士研究生，主要研究方向：图像处理
王志恒（1983—），男，山西阳泉人，副教授，博士，主要研究方向：地理信息建模技术及其在防灾减灾中的应用。
基金资助:
国家自然科学基金资助项目(41971310)

Abstract

Abstract:

Significant differences in object scale and aspect ratio in remote sensing images lead to difficult object detection in remote sensing images. Aiming at this characteristic of remote sensing image， in order to improve the precision of object detection in remote sensing images， EW-YOLO （Efficient Weighted-YOLO） was proposed by improving the YOLO framework. Firstly， the multi-level feature fusion structure was introduced in the feature fusion section， so that the dual-branch residual module was utilized to promote the fusion of features at different scales. And by the cascade of feature fusion modules and the cross-layer feature fusion design， the extraction capability of objects at different scales was improved， and the detection capability was further enhanced. Secondly， in the prediction section， the weighted detection head was proposed and Weighted Boxes Fusion （WBF） was introduced， so as to improve the detection precision of objects with different aspect ratios by weighting each candidate box using the confidence scores and generating prediction boxes by fusion. Finally， to address the issue of too large image size， an image resampling technique was proposed， which means that the images were sampled to appropriate sizes and joined into network training， solving the problem of low detection precision of large-size objects caused by cropping. Experimental results on DOTA dataset show that the detection mean Average Precision （mAP） of the proposed method is 77.47%， which is increased by 1.55 percentage points compared to that of the original YOLO framework based method. And compared with the current mainstream methods， the proposed method has superior performance. At the same time， the proposed method’s effectiveness is also verified on HRSC and UCAS-AOD datasets.

Key words: remote sensing image, object detection, deep learning, multi-scale feature fusion, Weighted Boxes Fusion (WBF)

摘要：

遥感图像中目标尺度变化大且目标长宽比差异大，导致遥感图像目标检测困难。针对遥感图像的这一特点，通过改进YOLO框架，提出EW-YOLO（Efficient Weighted-YOLO）提高遥感图像目标检测的精度。首先，在特征融合部分，设计多级特征融合结构，以利用双分支的残差模块促进不同尺度特征的融合，并通过融合模块的级联以及跨层特征的融合设计，增强对不同尺度目标的提取能力，并进一步增强检测能力；其次，在预测部分，提出加权检测头，引入加权检测框融合（WBF），以利用置信度分数对每个候选框进行加权，并融合生成预测框，从而提高不同长宽比目标的检测精度；最后，针对图像尺寸过大的问题，提出图像重采样处理方法，即通过将图像采样至合适大小并参与网络训练，解决由于切割造成的大尺寸目标检测精度较低的问题。在DOTA数据集上进行的实验的结果表明，所提方法的检测平均精度均值（mAP）达到了77.47%，较基于原始YOLO框架的方法提升了1.55个百分点，且优于目前的主流方法。同时，也在HRSC和UCAS-AOD数据集上验证了所提方法的有效性。

关键词: 遥感图像, 目标检测, 深度学习, 多尺度特征融合, 加权检测框融合

CLC Number:

TP751

Zhongwei ZHANG, Jun WANG, Shudong LIU, Zhiheng WANG. Object detection in remote sensing image based on multi-scale feature fusion and weighted boxes fusion[J]. Journal of Computer Applications, 2025, 45(2): 633-639.

张众维, 王俊, 刘树东, 王志恒. 多尺度特征融合与加权框融合的遥感图像目标检测[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 633-639.

Figures/Tables 12

References 21

1	廖育荣，王海宁，林存宝，等. 基于深度学习的光学遥感图像目标检测研究进展［J］. 通信学报， 2022， 43（5）：190-203.
	LIAO Y R， WANG H N， LIN C B， et al. Research progress of deep learning-based object detection of optical remote sensing image［J］. Journal on Communications， 2022， 43（5）：190-203.
2	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587.
3	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
4	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788.
5	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. ［2024-03-01］..
6	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944.
7	Ultralytics. YOLOv5［EB/OL］. ［2024-03-01］..
8	DING J， XUE N， LONG Y， et al. Learning RoI Transformer for oriented object detection in aerial images［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 2844-2853.
9	YANG X， YANG J， YAN J， et al. SCRDet： towards more robust detection for small， cluttered and rotated objects［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 8231-8240.
10	YANG X， YAN J， FENG Z， et al. R³Det： refined single-stage detector with feature refinement for rotating object［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021：3163-3171.
11	HAN J， DING J， XUE N， et al. ReDet： a rotation-equivariant detector for aerial object detection［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 2785-2794.
12	LI W， CHEN Y， HU K， et al. Oriented RepPoints for aerial object detection［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 1819-1828.
13	YU Y， DA F. Phase-shifting coder： predicting accurate orientation in oriented object detection［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 13354-13363.
14	汪西莉，梁正印，刘涛. 基于特征注意力金字塔的遥感图像目标检测方法［J］. 遥感学报， 2023， 27（2）：492-501.
	WANG X L， LIANG Z Y， LIU T. Feature attention pyramid-based remote sensing image object detection method［J］. National Remote Sensing Bulletin， 2023， 27（2）：492-501.
15	邵延华，张铎，楚红雨，等. 基于深度学习的YOLO目标检测综述［J］.电子与信息学报， 2022， 44（10）：3697-3708.
	SHAO Y H， ZHANG D， CHU H Y， et al. A review of YOLO object detection based on deep learning［J］. Journal of Electronics and Information Technology， 2022， 44（10）：3697-3708.
16	XU X， JIANG Y， CHEN W， et al. DAMO-YOLO： a report on real-time object detection design［EB/OL］. ［2024-03-01］..
17	SOLOVYEV R， WANG W， GABRUSEVA T. Weighted boxes fusion： ensembling boxes from different object detection models［J］. Image and Vision Computing， 2021， 107： No.104117.
18	YANG X， YAN J. On the arbitrary-oriented object detection： classification based approaches revisited［J］. International Journal of Computer Vision， 2022， 130（5）： 1340-1365.
19	LIU C， ZHANG S， HU M， et al. Object detection in remote sensing images based on adaptive multi-scale feature fusion method［J］. Remote Sensing， 2024， 16（5）： No.907.
20	XIE X， YOU Z H， CHEN S B， et al. Feature enhancement and alignment for oriented object detection［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 2024， 17： 778-787.
21	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： visual explanations from deep networks via gradient-based localization［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 618-626.

方法	AP															mAP
方法	PL	BD	BR	GTF	SV	LV	SH	TC	BC	ST	SBF	RA	HA	SP	HC	mAP
RoI Transformer	88.64	78.52	43.44	75.92	68.81	73.68	83.59	90.74	77.27	81.46	58.39	53.54	62.83	58.93	47.67	69.56
SCRDet	89.98	80.65	52.09	68.36	68.36	60.32	72.41	90.85	87.94	86.86	65.02	66.68	66.25	68.24	65.21	72.61
R³Det	89.49	81.17	50.53	66.10	70.92	78.66	78.21	90.81	85.26	84.23	61.81	63.77	68.16	69.83	67.17	73.74
文献［19］	89.26	82.26	51.33	68.49	78.88	74.14	85.59	90.88	84.94	85.73	60.78	64.76	65.72	71.32	59.08	74.21
FEADet	88.60	78.80	50.11	72.85	80.54	80.67	87.40	90.80	84.73	83.90	65.11	62.79	66.55	69.86	52.84	74.37
ReDet	88.79	82.64	53.97	74.00	78.13	84.06	88.04	90.89	87.78	85.75	61.76	60.39	75.96	68.07	63.59	76.25
O-RP	89.53	84.07	59.86	71.76	79.95	80.03	87.33	90.84	87.54	85.23	59.15	66.37	75.23	73.75	57.23	76.52
PSC	89.65	86.37	51.76	63.42	81.21	84.63	88.29	90.80	85.39	87.63	61.00	66.41	75.01	81.77	66.20	77.32
EW-YOLO	88.62	84.70	54.67	64.01	80.68	84.59	88.34	90.80	86.22	87.53	60.71	66.48	76.14	82.69	65.82	77.47

方法	AP															mAP
方法	PL	BD	BR	GTF	SV	LV	SH	TC	BC	ST	SBF	RA	HA	SP	HC	mAP
RoI Transformer	88.64	78.52	43.44	75.92	68.81	73.68	83.59	90.74	77.27	81.46	58.39	53.54	62.83	58.93	47.67	69.56
SCRDet	89.98	80.65	52.09	68.36	68.36	60.32	72.41	90.85	87.94	86.86	65.02	66.68	66.25	68.24	65.21	72.61
R³Det	89.49	81.17	50.53	66.10	70.92	78.66	78.21	90.81	85.26	84.23	61.81	63.77	68.16	69.83	67.17	73.74
文献［19］	89.26	82.26	51.33	68.49	78.88	74.14	85.59	90.88	84.94	85.73	60.78	64.76	65.72	71.32	59.08	74.21
FEADet	88.60	78.80	50.11	72.85	80.54	80.67	87.40	90.80	84.73	83.90	65.11	62.79	66.55	69.86	52.84	74.37
ReDet	88.79	82.64	53.97	74.00	78.13	84.06	88.04	90.89	87.78	85.75	61.76	60.39	75.96	68.07	63.59	76.25
O-RP	89.53	84.07	59.86	71.76	79.95	80.03	87.33	90.84	87.54	85.23	59.15	66.37	75.23	73.75	57.23	76.52
PSC	89.65	86.37	51.76	63.42	81.21	84.63	88.29	90.80	85.39	87.63	61.00	66.41	75.01	81.77	66.20	77.32
EW-YOLO	88.62	84.70	54.67	64.01	80.68	84.59	88.34	90.80	86.22	87.53	60.71	66.48	76.14	82.69	65.82	77.47

方法	mAP	方法	mAP
SBD	93.70	ReDet	97.63
R³Det	96.01	本文方法	97.75
O-RP	97.26

方法	mAP	方法	mAP
SBD	93.70	ReDet	97.63
R³Det	96.01	本文方法	97.75
O-RP	97.26

方法	mAP	方法	mAP
RoI Transformer	88.95	R³Det	96.17
O-RP	90.11	本文方法	98.10
RetinaNet-H	95.47

Object detection in remote sensing image based on multi-scale feature fusion and weighted boxes fusion

多尺度特征融合与加权框融合的遥感图像目标检测

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 21

Related Articles 15

Recommended Articles

Metrics

方法	mAP@0.5/%	Params/10⁶	GFLOPs
YOLOv5-S	75.9	7.5	17.6
YOLOX-S	69.9	9.0	26.8
YOLOv7-tiny	74.6	6.2	13.8
YOLOv8-S	69.8	11.2	28.6
EW-YOLO	77.5	7.7	16.3

方法	WHead	EFPN	mAP/%
baseline			75.92
baseline+WHead	√		76.34
baseline+EFPN		√	77.02
baseline+WHead+EFPN	√	√	77.47

方法	WHead	EFPN	mAP/%
baseline			95.08
baseline+WHead	√		96.41
baseline+EFPN		√	96.65
baseline+WHead+EFPN	√	√	97.75

[1]	Miaolei DENG, Yupei KAN, Chuanchuan SUN, Haihang XU, Shaojun FAN, Xin ZHOU. Summary of network intrusion detection systems based on deep learning [J]. Journal of Computer Applications, 2025, 45(2): 453-466.
[2]	Tianqi ZHANG, Shuang TAN, Xiwen SHEN, Juan TANG. Image watermarking method combining attention mechanism and multi-scale feature [J]. Journal of Computer Applications, 2025, 45(2): 616-623.
[3]	Zirong HONG, Guangqing BAO. Review of radar automatic target recognition based on ensemble learning [J]. Journal of Computer Applications, 2025, 45(2): 371-382.
[4]	Sheng YANG, Yan LI. Contrastive knowledge distillation method for object detection [J]. Journal of Computer Applications, 2025, 45(2): 354-361.
[5]	Shijia WEN, Shijun JING. Dynamic visual SLAM algorithm incorporating object detection and feature point association [J]. Journal of Computer Applications, 2025, 45(2): 610-615.
[6]	Shang LIU, Yuwei ZHOU, Rao DAI, Linfang DONG, Meng LIU. Small target detection algorithm in remote sensing images integrating attention and contextual information [J]. Journal of Computer Applications, 2025, 45(1): 292-300.
[7]	Siqi ZHANG, Jinjun ZHANG, Tianyi WANG, Xiaolin QIN. Deep temporal event detection algorithm based on signal temporal logic [J]. Journal of Computer Applications, 2025, 45(1): 90-97.
[8]	Zongsheng ZHENG, Jia DU, Yuhe CHENG, Zecheng ZHAO, Yuewei ZHANG, Xulong WANG. Cross-modal dual-stream alternating interactive network for infrared-visible image classification [J]. Journal of Computer Applications, 2025, 45(1): 275-283.
[9]	Xinran XU, Shaobing ZHANG, Miao CHENG, Yang ZHANG, Shang ZENG. Bearings fault diagnosis method based on multi-pathed hierarchical mixture-of-experts model [J]. Journal of Computer Applications, 2025, 45(1): 59-68.
[10]	Jietao LIANG, Bing LUO, Lanhui FU, Qingling CHANG, Nannan LI, Ningbo YI, Qi FENG, Xin HE, Fuqin DENG. Point cloud registration method based on coordinate geometric sampling [J]. Journal of Computer Applications, 2025, 45(1): 214-222.
[11]	Yan YAN, Xingying QIAN, Pengbin YAN, Jie YANG. Federated learning-based statistical prediction and differential privacy protection method for location big data [J]. Journal of Computer Applications, 2025, 45(1): 127-135.
[12]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[13]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[14]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[15]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.