Remote sensing image small target detection based on improved YOLOv3

doi:10.11772/j.issn.1001-9081.2021101802

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3723-3732.DOI: 10.11772/j.issn.1001-9081.2021101802

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Remote sensing image small target detection based on improved YOLOv3

Hao FENG¹, Chaobing HUANG¹(), Yuanqiao WEN²

^1.School of Information Engineering，Wuhan University of Technology，Wuhan Hubei 430070，China
^2.Intelligent Transportation Systems Research Center，Wuhan University of Technology，Wuhan Hubei 430063，China

Received:2021-10-22 Revised:2022-01-10 Accepted:2022-01-14 Online:2022-01-19 Published:2022-12-10
Contact: Chaobing HUANG
About author:FENG Hao， born in 1996， M. S. candidate. His research interests include information processing， image processing and recognition.
WEN Yuanqiao， born in 1975， Ph. D.， professor. His research interests include water traffic safety， intelligent ships.
Supported by:
National Natural Science Foundation of China(52072287)

基于改进YOLOv3的遥感图像小目标检测

冯号¹, 黄朝兵¹(), 文元桥²

^1.武汉理工大学信息工程学院，武汉 430070
^2.武汉理工大学智能交通系统研究中心，武汉，430063

通讯作者: 黄朝兵
作者简介:冯号（1996—），男，重庆人，硕士研究生，主要研究方向：信息处理、图像处理与识别
文元桥（1975—），男，湖北松滋人，教授，博士，主要研究方向：水上交通安全、智能船舶。
基金资助:
国家自然科学基金资助项目(52072287)

Abstract

Abstract:

YOLOv3 （You Only Look Once version 3） algorithm is widely used in target detection tasks. Although some improved algorithms based on YOLOv3 have achieved some results， there are still problems of insufficient representation ability and low detection accuracy， especially for the detection of small targets. In order to solve the above problems， a small target detection algorithm for remote sensing images based on YOLOv3 was proposed. Firstly， K-means Transformation （K-means-T） algorithm was used to optimize the size of anchor box， so that the matching degree between the priori box and ground truth box was improved. Secondly， the confidence loss function was optimized to solve the problem of uneven distribution of hard and easy samples. Finally， attention mechanism was introduced to improve the algorithm’s ability to perceive the detailed information. Results of the experiments carried out on RSOD dataset show that compared with the original YOLOv3 algorithm and YOLOv4 algorithm， the proposed algorithm has the detection Average Precision （AP） on the small target class “aircraft” increased by 7.3 percentage points and 5.9 percentage points respectively， illustrating that the proposed improved algorithm can detect small targets in remote sensing images effectively， with higher accuracy.

Key words: small target detection, YOLO (You Only Look Once)v3, K-means Transformation (K-means-T), confidence loss function, attention mechanism

摘要：

YOLOv3算法被广泛地应用于目标检测任务。虽然在YOLOv3基础上改进的一些算法取得了一定的成果，但是仍存在表征能力不足且检测精度不高的问题，尤其对小目标的检测还不能满足需求。针对上述问题，提出了一种改进YOLOv3的遥感图像小目标检测算法。首先，使用K均值聚类变换（K-means-T）算法优化锚框的大小，从而提升先验框和真实框之间的匹配度；其次，优化置信度损失函数，以解决难易样本分布不均衡的问题；最后，引入注意力机制来提高算法对细节信息的感知能力。在RSOD数据集上进行实验的结果显示，与原始的YOLOv3算法、YOLOv4算法相比，所提算法在小目标“飞机（aircraft）”类上的平均精确率（AP）分别提高了7.3个百分点和5.9个百分点。这表明所提算法能够有效检测遥感图像小目标，具有更高的准确率。

关键词: 小目标检测, YOLOv3, K均值聚类变换, 置信度损失函数, 注意力机制

CLC Number:

TP391.4

Hao FENG, Chaobing HUANG, Yuanqiao WEN. Remote sensing image small target detection based on improved YOLOv3[J]. Journal of Computer Applications, 2022, 42(12): 3723-3732.

冯号, 黄朝兵, 文元桥. 基于改进YOLOv3的遥感图像小目标检测[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3723-3732.

Figures/Tables 19

References 19

1	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440. 10.1109/cvpr.2015.7298965
2	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
3	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
4	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
5	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017：6517-6525. 10.1109/cvpr.2017.690
6	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2021-09-10］.. 10.1109/cvpr.2017.690
7	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-04-23）［2021-09-14］..
8	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（2）： 318-327. 10.1109/tpami.2018.2858826
9	WANG F L， SU J Y. Based on the improved YOLOV3 small target detection algorithm［C］// Proceedings of the IEEE 4th Advanced Information Management， Communicates， Electronic and Automation Control Conference. Piscataway： IEEE， 2021： 2155-2159. 10.1109/imcec51613.2021.9482076
10	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
11	KISANTAL M， WOJNA Z， MURAWSKI J， et al. Augmentation for small object detection［EB/OL］. （2019-02-19）［2021-08-15］.. 10.5121/csit.2019.91713
12	LIU S T， HUANG D， WANG Y H. Receptive field block net for accurate and fast object detection［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11215. Cham： Springer， 2018： 404-419.
13	邵慧翔，曾丹. 基于改进YOLOv3算法的水下小目标分类与识别［J］. 上海大学学报（自然科学版）， 2021， 27（3）：481-491. 10.12066/j.issn.1007-2861.2279
	SHAO H X， ZENG D. Classification and recognition of underwater small targets based on improved YOLOv3 algorithm［J］. Journal of Shanghai University （Natural Science Edition）， 2021， 27（3）：481-491. 10.12066/j.issn.1007-2861.2279
14	于洋，李世杰，陈亮，等. 基于改进 YOLO v2 的船舶目标检测方法［J］. 计算机科学， 2019， 46（8）： 332-336.
	YU Y， LI S J， CHEN L， et al. Ship target detection based on improved YOLO v2［J］. Computer Science， 2019， 46（8）： 332-336.
15	YE K Q， FANG Z B， HUANG X J， et al. Research on small target detection algorithm based on improved YOLOv3［C］// Proceedings of the 5th International Conference on Mechanical， Control and Computer Engineering. Piscataway： IEEE， 2020： 1467-1470. 10.1109/icmcce51767.2020.00321
16	REZAEE M， ZHANG Y， MISHRA R， et al. Using a VGG-16 network for individual tree species detection with an object-based approach［C］// Proceedings of the 10th IAPR Workshop on Pattern Recognition in Remote Sensing. Piscataway： IEEE， 2018： 1-7. 10.1109/prrs.2018.8486395
17	LI B Q， HE Y Y. An improved ResNet based on the adjustable shortcut connections［J］. IEEE Access， 2018， 6：18967-18974. 10.1109/access.2018.2814605
18	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
19	王建军，魏江，梅少辉，等. 面向遥感图像小目标检测的改进YOLOv3算法［J］. 计算机工程与应用， 2021， 57（20）： 133-141.
	WANG J J， WEI J， MEI S H， et al. Improved YOLOv3 for small target detection in remote sensing images［J］. Computer Engineering and Applications， 2021， 57（20）： 133-141.

算法	mAP@0.5	F1	AP（aircraft）	F1（aircraft）
YOLOv3	0.833	0.825	0.850	0.850
YOLOv4^［7］	0.862	0.835	0.864	0.840
EfficientDet^［18］	0.881	0.858	0.603	0.620
文献［19］算法	0.903	0.865	—	—
YOLOv3^［15］	0.827	0.835	0.833	0.830
YOLOv3-AKT	0.903	0.870	0.923	0.900

算法	mAP@0.5	F1	AP（aircraft）	F1（aircraft）
YOLOv3	0.833	0.825	0.850	0.850
YOLOv4^［7］	0.862	0.835	0.864	0.840
EfficientDet^［18］	0.881	0.858	0.603	0.620
文献［19］算法	0.903	0.865	—	—
YOLOv3^［15］	0.827	0.835	0.833	0.830
YOLOv3-AKT	0.903	0.870	0.923	0.900

特征图	52×52	26×26	13×13
Anchor^［15］	（8，8），（11，12），（15，14）	（18，19），（23，24），（30，32）	（40，44），（51，58），（145，178）
Anchor-T（c=1）	（8，8），（14，15），（22，20）	（28，30），（38，40），（53，56）	（73，81），（96，109），（290，356）
Anchor-T（c=3）	（4，4），（8，9），（15，14）	（21，22），（31，33），（46，49）	（67，74），（91，103），（290，356）
Anchor-T（c=5）	（4，4），（7，8），（12，12）	（16，17），（23，24），（38，40）	（60，66），（84，95），（290，356）
Anchor-T（c=7）	（4，4），（7，8），（11，11）	（15，16），（20，21），（28，30）	（40，44），（66，75），（290，356）
Anchor-T （c=9）	（4，4），（7，7），（11，10）	（14，15），（19，20），（26，28）	（36，40），（48，54），（290，356）
Anchor^［13］	（4，4），（10，11），（18，17）	（24，26），（35，36），（49，53）	（70，77），（93，106），（290，356）

特征图	52×52	26×26	13×13
Anchor^［15］	（8，8），（11，12），（15，14）	（18，19），（23，24），（30，32）	（40，44），（51，58），（145，178）
Anchor-T（c=1）	（8，8），（14，15），（22，20）	（28，30），（38，40），（53，56）	（73，81），（96，109），（290，356）
Anchor-T（c=3）	（4，4），（8，9），（15，14）	（21，22），（31，33），（46，49）	（67，74），（91，103），（290，356）
Anchor-T（c=5）	（4，4），（7，8），（12，12）	（16，17），（23，24），（38，40）	（60，66），（84，95），（290，356）
Anchor-T（c=7）	（4，4），（7，8），（11，11）	（15，16），（20，21），（28，30）	（40，44），（66，75），（290，356）
Anchor-T （c=9）	（4，4），（7，7），（11，10）	（14，15），（19，20），（26，28）	（36，40），（48，54），（290，356）
Anchor^［13］	（4，4），（10，11），（18，17）	（24，26），（35，36），（49，53）	（70，77），（93，106），（290，356）

算法	mAP@0.5	F1	AP（aircraft）
原始YOLOv3	0.840	0.850	0.852
YOLOv3^［15］	0.827	0.835	0.833
YOLOv3^［13］	0.853	0.855	0.870
YOLOv3-T	0.868	0.860	0.884

Remote sensing image small target detection based on improved YOLOv3

基于改进YOLOv3的遥感图像小目标检测

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 19

References 19

Related Articles 15

Recommended Articles

Metrics

[1]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[2]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[3]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[4]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[5]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[6]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[7]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[8]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[9]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[10]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[11]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[12]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[13]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[14]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[15]	Xiaolu WANG, Wangfei QIAN. Gait recognition method based on two-branch convolutional network [J]. Journal of Computer Applications, 2024, 44(6): 1965-1971.

CA模块位置	mAP@0.5	AP（aircraft）	F1（aircraft）
YOLOv3	0.840	0.850	0.850
检测头1	0.849	0.882	0.860
检测头2	0.846	0.884	0.860
检测头3	0.869	0.901	0.880

CA模块位置	mAP@0.5	AP（aircraft）	F1（aircraft）
YOLOv3	0.840	0.850	0.850
检测头1	0.849	0.882	0.860
检测头2	0.846	0.884	0.860
检测头3	0.869	0.901	0.880