Small object detection algorithm from drone perspective based on improved YOLOv8n

doi:10.11772/j.issn.1001-9081.2023111644

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (11): 3603-3609.DOI: 10.11772/j.issn.1001-9081.2023111644

• Multimedia computing and computer simulation • Previous Articles Next Articles

Small object detection algorithm from drone perspective based on improved YOLOv8n

Tao LIU¹^,², Shihong JU¹(), Yimeng GAO¹

^1.School of Software，Liaoning Technical University，Huludao Liaoning 125105，China
^2.Department of Basic Teaching，Liaoning Technical University，Huludao Liaoning 125105，China

Received:2023-12-01 Revised:2024-04-05 Accepted:2024-04-12 Online:2024-05-30 Published:2024-11-10
Contact: Shihong JU
About author:LIU Tao， born in 1981， M. S.， associate professor. His research interests include computer vision， intelligent data processing.
GAO Yimeng， born in 1996， M. S. candidate. Her research interests include image processing， pattern recognition， artificial intelligence.
Supported by:
National Natural Science Foundation of China(61172144)

基于改进YOLOv8n的无人机视角下小目标检测算法

刘涛¹^,², 鞠事宏¹(), 高一萌¹

^1.辽宁工程技术大学软件学院，辽宁葫芦岛 125105
^2.辽宁工程技术大学基础教学部，辽宁葫芦岛 125105

通讯作者: 鞠事宏
作者简介:刘涛（1981—），男，河北定州人，副教授，硕士，主要研究方向：计算机视觉、智能数据处理
高一萌（1996—），女，辽宁鞍山人，硕士研究生，主要研究方向：图像处理、模式识别、人工智能。
基金资助:
国家自然科学基金资助项目(61172144)

Abstract

Abstract:

In view of the low accuracy of object detection algorithms in small object detection from drone perspective， a new small object detection algorithm named SFM-YOLOv8 was proposed by improving the backbone network and attention mechanism of YOLOv8. Firstly， the SPace-to-Depth Convolution （SPDConv） suitable for low-resolution images and small object detection was integrated into the backbone network to retain discriminative feature information and improve the perception ability to small objects. Secondly， a multi-branch attention named MCA （Multiple Coordinate Attention） was introduced to enhance the spatial and channel information on the feature layer. Then， a convolution FE-C2f fusing FasterNet and Efficient Multi-scale Attention （EMA） was constructed to reduce the computational cost and lightweight the model. Besides， a Minimum Point Distance based Intersection over Union （MPDIoU） loss function was introduced to improve the accuracy of the algorithm. Finally， a small object detection layer was added to the network structure of YOLOv8n to retain more location information and detailed features of small objects. Experimental results show that compared with YOLOv8n， SFM-YOLOv8 achieves a 4.37 percentage point increase in mAP₅₀ （mean Average Precision） with a 5.98% reduction in parameters on VisDrone-DET2019 dataset. Compared to the related mainstream models， SFM-YOLOv8 achieves higher accuracy and meets real-time detection requirements.

Key words: small object detection, YOLOv8, feature extraction, attention mechanism, loss function

摘要：

针对目标检测算法在无人机视角下的小目标检测中精度低的问题，通过改进YOLOv8的骨干网络与注意力机制，提出一种新的小目标检测算法SFM-YOLOv8。首先，在骨干网络中融入适用于低分辨率图像和小物体检测的空间深度转换卷积（SPDConv），保留判别特征信息，提高小目标感知能力；其次，插入多分支注意力MCA（Multiple Coordinate Attention），加强提取特征层的空间信息和通道信息；然后，构建一种融合FasterNet和高效多尺度注意力（EMA）的卷积FE-C2f，减少计算量并使模型轻量化；此外，引入边界框相似度比较度量（MPDIoU）损失函数提高算法精度；最后，在YOLOv8n的网络结构中增加小目标检测层，保留更多关于小目标的位置信息和细节特征。实验结果表明，与YOLOv8n相比，SFM-YOLOv8算法在VisDrone-DET2019数据集上的平均精度均值mAP₅₀提高了4.37个百分点，参数量减少了5.98%；与相关主流模型对比，精度也有所提升，且满足实时检测需求。

关键词: 小目标检测, YOLOv8, 特征提取, 注意力机制, 损失函数

CLC Number:

TP391.41

Tao LIU, Shihong JU, Yimeng GAO. Small object detection algorithm from drone perspective based on improved YOLOv8n[J]. Journal of Computer Applications, 2024, 44(11): 3603-3609.

刘涛, 鞠事宏, 高一萌. 基于改进YOLOv8n的无人机视角下小目标检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3603-3609.

Figures/Tables 11

Fig. 1 Network structure of SFM-YOLOv8

Fig. 2 Illustration of SPDConv with scale factor of 2

Fig. 3 Implementation process of MCA

Fig. 4 Structure comparison of PConv and FasterNet Block

Fig. 5 Network structure of EMA attention

Fig. 6 Network structure of FE-C2f

Fig. 7 Parameters of MPDIoU

Tab. 1 Ablation experiment results

模型	参数量/ $106$	GFLOPs	mAP₅₀/%	mAP_50-95/%
YOLOv8n	3.01	08.2	28.96	16.55
+FE-C2f	2.66	07.3	28.29	15.91
+MCA	3.05	08.3	29.26	16.68
+MPDIoU	3.01	08.2	29.37	16.79
+SPDConv	3.27	11.7	31.15	17.95
+std	2.93	12.4	31.83	18.64
SFM-YOLOv8	2.83	14.9	33.33	19.62

Tab. 1 Ablation experiment results

模型	参数量/ $106$	GFLOPs	mAP₅₀/%	mAP_50-95/%
YOLOv8n	3.01	08.2	28.96	16.55
+FE-C2f	2.66	07.3	28.29	15.91
+MCA	3.05	08.3	29.26	16.68
+MPDIoU	3.01	08.2	29.37	16.79
+SPDConv	3.27	11.7	31.15	17.95
+std	2.93	12.4	31.83	18.64
SFM-YOLOv8	2.83	14.9	33.33	19.62

Tab. 2 Comparative experiment results

模型	mAP₅₀/%	mAP_50-95/%	FPS
Faster R-CNN	22.3	16.3	15
RetinaNet	24.1	16.9	/
Cascade R-CNN	25.6	16.1	/
YOLOv5n	27.9	15.9	61
YOLOv5s	29.7	16.2	52
YOLOv6n	25.8	14.8	54
YOLOv8n	29.0	16.6	60
BD-YOLO	32.6	17.9	/
改进YOLOv5s	32.9	18.4	50
SFM-YOLOv8	33.3	19.6	43

Fig. 8 Detection effects of SFM-YOLOv8

Fig. 9 Comparison of detection effects before and after model improvement

References 26

1	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 1. Cambridge： MIT Press， 2015： 91-99.
2	HE K， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988.
3	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot MultiBox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
4	LI X， WANG W， WU L， et al. Generalized focal loss： learning qualified and distributed bounding boxes for dense object detection［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2020： 21002-21012.
5	ZHENG Z， WANG P， REN D， et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation［J］. IEEE Transactions on Cybernetics， 2022， 52（8）： 8574-8586.
6	CHEN J， KAO S H， HE H， et al. Run， don't walk： chasing higher FLOPS for faster neural networks［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 12021-12031.
7	OUYANG D， HE S， ZHANG G， et al. Efficient multi-scale attention module with cross-spatial learning［C］// Proceedings of the 2023 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2023： 1-5.
8	SILIANG M， YONG X. MPDIoU： a loss for efficient and accurate bounding box regression［EB/OL］. ［2023-10-10］..
9	SUNKARA R， LUO T. No more strided convolutions or pooling： a new CNN building block for low-resolution images and small objects［C］// Proceedings of the 2022 Joint European Conference on Machine Learning and Knowledge Discovery in Databases， LNCS 13715. Cham： Springer， 2023： 443-459.
10	HOU Q， ZHOU D， FENG J. Coordinate attention for efficient mobile network design［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13713-13722.
11	HOWARD A G， ZHU M， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications［EB/OL］. ［2023-08-15］..
12	ZHANG X， ZHOU X， LIN M， et al. ShuffleNet： an extremely efficient convolutional neural network for mobile devices［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6848-6856.
13	HAN K， WANG Y， TIAN Q， et al. GhostNet： more features from cheap operations［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1580-1589.
14	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141.
15	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19
16	LI X， HU X， YANG J. Spatial group-wise enhance： improving semantic feature learning in convolutional networks［EB/OL］. ［2023-09-22］..
17	LIU H， LIU F， FAN X， et al. Polarized self-attention： towards high-quality pixel-wise regression［EB/OL］. ［2023-10-12］..
18	MISRA D， NALAMADA T， ARASANIPALAI A U， et al. Rotate to attend： convolutional triplet attention module［C］// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021： 3138-3147.
19	ZHANG Q L， YANG Y B. SA-Net： shuffle attention for deep convolutional neural networks［C］// Proceedings of the 2021 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2021： 2235-2239.
20	DU D， ZHU P， WEN L， et al. VisDrone-DET2019： the vision meets drone object detection in image challenge results［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway： IEEE， 2019： 213-226.
21	李安达，吴瑞明，李旭东.改进YOLOv7的小目标检测算法研究［J］.计算机工程与应用，2024，60（1）：122-134.
	LI A D， WU R M， LI X D. Research on improving YOLOv7's small target detection algorithm［J］. Computer Engineering and Applications， 2024， 60（1）： 122-134.
22	秦强强，廖俊国，周弋荀.基于多分支混合注意力的小目标检测算法［J］.计算机应用，2023，43（11）：3579-3586.
	QIN Q Q， LIAO J G， ZHOU Y X. Small object detection algorithm based on split mixed attention［J］. Journal of Computer Applications， 2023， 43（11）： 3579-3586.
23	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007.
24	吴明杰，云利军，陈载清，等.改进YOLOv5s的无人机视角下小目标检测算法［J］.计算机工程与应用，2024，60（2）：191-199.
	WU M J， YUN L J， CHEN Z Q， et al. Improved YOLOv5s small target detection algorithm in UAV view［J］. Computer Engineering and Applications， 2024， 60（2）： 191-199.
25	刘涛，高一萌，柴蕊，等.改进YOLOv5s的无人机视角下小目标检测算法［J］.计算机工程与应用，2024，60（1）：110-121.
	LIU T， GAO Y M， CHAI R， et al. Improving YOLOv5s UAV view small object detection algorithm［J］. Computer Engineering and Applications， 2024， 60（1）： 110-121.
26	梁秀满，贾梓涵，于海峰，等.基于改进YOLOv7的无人机图像目标检测算法［J］.无线电工程，2024，54（4）：937-946.
	LIANG X M， JIA Z H， YU H F， et al. UAV image object detection algorithm based on improved YOLOv7［J］. Radio Engineering， 2024， 54（4）： 937-946.

[1]	Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951.
[2]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[3]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[4]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[5]	Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587.
[6]	Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380.
[7]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[8]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[9]	Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413.
[10]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[11]	Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603.
[12]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[13]	Zhe KONG, Han LI, Shaowei GAN, Mingru KONG, Bingtao HE, Ziyu GUO, Ducheng JIN, Zhaowen QIU. Structure segmentation model for 3D kidney images based on asymmetric multi-decoder and attention module [J]. Journal of Computer Applications, 2024, 44(7): 2216-2224.
[14]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[15]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.

Small object detection algorithm from drone perspective based on improved YOLOv8n

基于改进YOLOv8n的无人机视角下小目标检测算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 26

Related Articles 15

Recommended Articles

Metrics