Optimization model for small object detection based on multi-level feature bidirectional fusion

doi:10.11772/j.issn.1001-9081.2023091274

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (9): 2871-2877.DOI: 10.11772/j.issn.1001-9081.2023091274

• Multimedia computing and computer simulation • Previous Articles Next Articles

Optimization model for small object detection based on multi-level feature bidirectional fusion

Yexin PAN¹^,², Zhe YANG¹^,²()

^1.School of Computer Science & Technology，Soochow University，Suzhou Jiangsu 215006，China
^2.Jiangsu Provincial Key Laboratory for Computer Information Processing Technology （Soochow University），Suzhou Jiangsu 215006，China

Received:2023-09-18 Revised:2023-11-28 Accepted:2023-12-01 Online:2024-03-15 Published:2024-09-10
Contact: Zhe YANG
About author:PAN Yexin， born in 1999， M. S. candidate. His research interests include computer vision， deep learning.
Supported by:
National Natural Science Foundation of China(62002253);Collaborative Education Program on Industry and Education of Ministry of Education(220606363154256);National College Student Innovation and Entrepreneurship Training Program Project(202210285042Z)

基于多级特征双向融合的小目标检测优化模型

潘烨新¹^,², 杨哲¹^,²()

^1.苏州大学计算机科学与技术学院，江苏苏州 215006
^2.江苏省计算机信息处理技术重点实验室（苏州大学），江苏苏州 215006

通讯作者: 杨哲
作者简介:潘烨新（1999—），男，江苏苏州人，硕士研究生，CCF会员，主要研究方向：计算机视觉、深度学习；
基金资助:
国家自然科学基金资助项目(62002253);教育部产学合作协同育人项目(220606363154256);国家级大学生创新创业训练计划项目(202210285042Z)

Abstract

Abstract:

Due to objective factors such as small inherent features and the depth of the network causing feature loss， the detection of small objects is always a challenging issue in the field of object detection. To address the above issues， a model for optimizing the detection of small objects was proposed based on multiple feature enhancements based on the network structure. Firstly， the optimization of gradient calculation was achieved by replacing Spatial Pyramid Pooling （SPP） in the backbone network. Secondly， a multi-level bidirectional fusion at the feature level and the addition of Adaptive Feature Fusion （AFF） module to the output head were employed in the network neck to achieve multi-level feature enhancement. Experimental results show that on COCO2017-val dataset， when the IoU （Intersection over Union） is 0.5， the average precision of the proposed model reaches 61.4%， which is 4.7 percentage points higher than that of the currently popular YOLOv7 model. At the same time， the detection frame rate of the proposed model with a single GPU is 78.2 frame/s， which is in line with industrial level detection speed.

Key words: deep learning, small object, object detection, computer vision, feature fusion

摘要：

由于自身特征较小以及网络的深度造成特征丢失等客观原因，小目标的检测一直是目标检测领域的难点问题。针对以上问题，提出基于网络结构进行多次特征增强以优化小目标检测的模型。首先，替换主干网络中的空间金字塔池化（SPP）以优化梯度计算；其次，对网络颈部实行区分特征级别的多级双向融合，并对输出头添加自适应特征融合（AFF）模块，以实现多级的特征增强。实验结果表明，在COCO2017-val数据集上，当交并比（IoU）为0.5时，所提模型的平均精度均值达到61.4%，与目前较流行的YOLOv7模型相比提高了4.7个百分点，同时在单GPU上模型的检测帧率为78.2 frame/s，满足工业检测速度要求。

关键词: 深度学习, 小目标, 目标检测, 计算机视觉, 特征融合

CLC Number:

TP391

Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion[J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.

潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.

Figures/Tables 10

References 43

1	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
2	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector ［C］// Proceedings of the 14th European Conference on Computer Vision. Cham： Springer， 2016： 21-37.
3	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788.
4	MA W， WU Y， CEN F， et al. MDFN： multi-scale deep feature learning network for object detection ［J］. Pattern Recognition， 2020， 100： 107149.
5	REDMON J， FARHADI A. YOLO9000： better， faster， stronger ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525.
6	王建军，魏江，梅少辉，等. 面向遥感图像小目标检测的改进YOLOv3算法［J］. 计算机工程与应用， 2021， 57（20）： 133-141.
	WANG J J， WEI J， MEI S H， et al. Improved YOLOv3 for small object detection in remote sensing images ［J］. Computer Engineering and Applications， 2021， 57（20）： 133-141.
7	陈欣，万敏杰，马超，等. 采用多尺度特征融合SSD的遥感图像小目标检测［J］. 光学精密工程， 2021， 29（11）： 2672-2682.
	CHEN X， WAN M J， MA C， et al. Recognition of small targets in remote sensing image using multi-scale feature fusion-based shot multi-box detector［J］. Optics and Precision Engineering， 2021， 29（11）： 2672-2682.
8	ZHU X， LYU S， WANG X， et al. TPH-YOLOv5： improved YOLOv5 based on Transformer prediction head for object detection on drone-captured scenarios ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway： IEEE， 2021： 2778-2788.
9	蒋镕圻，彭月平，谢文宣，等. 嵌入scSE模块的改进YOLOv4小目标检测算法［J］. 图学学报， 2021， 42（4）： 546-555.
	JIANG R X， PENG Y P， XIE W X， et al. Improved YOLOv4 small target detection algorithm with embedded scSE module ［J］. Journal of Graphics， 2021， 42（4）： 546-555.
10	ZHAO H， ZHANG H， ZHAO Y. YOLOv7-sea： object detection of maritime UAV images based on improved YOLOv7 ［C］// Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2023： 233-238.
11	GE Z， LIU S， WANG F， et al. YOLOX： exceeding yolo series in 2021 ［EB/OL］. ［2022-03-22］. .
12	YANG C， HANG Z， WANG N. QueryDet： cascaded sparse query for accelerating high-resolution small object detection ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 13658-13667.
13	赵鹏飞，谢林柏，彭力. 融合注意力机制的深层次小目标检测算法［J］. 计算机科学与探索， 2022， 16（4）： 927-937.
	ZHAO P F， XIE L B， PENG L. Deep small object detection algorithm integrating attention mechanism ［J］. Journal of Frontiers of Computer Science and Technology， 2022， 16（4）： 927-937.
14	ZHANG Z， LU X， CAO G， et al. ViT-YOLO： Transformer-based YOLO for object detection ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway： IEEE， 2021： 2799-2808.
15	WU S， YU F， YU X， et al. TFNet： multi-semantic feature interaction for CTR prediction ［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 1885-1888.
16	LI X， WANG W， WU L， et al. Generalized focal loss： learning qualified and distributed bounding boxes for dense object detection［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 21002-21012.
17	PANG J， CHEN K， SHI J， et al. Libra R-CNN： towards balanced learning for object detection ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 821-830.
18	LIANG T， WANG Y， TANG Z， et al. OPANAS： one-shot path aggregation network architecture search for object detection ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10190-10198.
19	ZENG N， WU P， WANG Z， et al. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection ［J］. IEEE Transactions on Instrumentation and Measurement， 2022， 71： 3507014.
20	ZHANG W， FU C， XIE H， et al. Global context aware RCNN for object detection ［J］. Neural Computing and Applications， 2021， 33（18）： 11627-11639.
21	WANG C-Y， BOCHKOVSKIY A， LIAO H-Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors ［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 7464-7475.
22	BOCHKOVSKIY A， WANG C-Y， LIAO H-Y M. YOLOv4： optimal speed and accuracy of object detection ［EB/OL］. ［2023-02-14］. .
23	TANG P， WANG X， WANG A， et al. Weakly supervised region proposal network and object detection ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 370-386.
24	LIN T-Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007.
25	MA J. RRPN++： guidance towards more accurate scene text detection ［EB/OL］. ［2023-05-29］. .
26	SINGH B， NAJIBI M， DAVIS L S. SNIPER： efficient multi-scale training ［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2018： 9333-9343.
27	HE K， ZHANG X， REN S， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916.
28	LIN T-Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944.
29	DU D， ZHU P， WEN L， et al. VisDrone-DET2019： the vision meets drone object detection in image challenge results ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway： IEEE， 2019： 213-226.
30	FU C-Y， LIU W， RANGA A， et al. DSSD： deconvolutional single shot detector ［EB/OL］. ［2023-05-29］. .
31	REDMON J， FARHADI A. YOLOv3： an incremental improvement ［EB/OL］. ［2022-10-24］. .
32	SHRIVASTAVA A， GUPTA A， GIRSHICK R. Training region-based object detectors with online hard example mining ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 761-769.
33	ZHU Y， ZHAO C， WANG J， et al. CoupleNet： coupling global structure with local parts for object detection ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 4146-4154.
34	SONG H， SUN D， CHUN S， et al. ViDT： an efficient and effective fully Transformer-based object detector ［EB/OL］. ［2023-08-13］. .
35	CHEN C， ZHANG Y， LV Q， et al. RRNet： a hybrid detector for object detection in drone-captured images ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway： IEEE， 2019： 100-108.
36	HUANG Q， ZHAO C， JIANG M， et al. Cascade-Net： a new deep learning architecture for OFDM detection ［EB/OL］. ［2023-08-24］. .
37	ZHOU J， C-M VONG， LIU Q， et al. Scale adaptive image cropping for UAV object detection ［J］. Neurocomputing， 2019， 366： 305-313.
38	DENG S， LI S， XIE K， et al. A global-local self-adaptive network for drone-view object detection ［J］. IEEE Transactions on Image Processing， 2020， 30： 1556-1569.
39	LI Y， CHEN Y， WANG N， et al. Scale-aware trident networks for object detection ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6054-6063.
40	LIU Y， YANG F， HU P. Small-object detection in UAV-captured images via multi-branch parallel feature pyramid networks ［J］. IEEE Access， 2020， 8： 145740-145750.
41	GHIASI G， LIN T-Y， LE Q V. NAS-FPN： learning scalable feature pyramid architecture for object detection ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7029-7038.
42	TAN M， PANG R， LE Q V. EfficientDet： scalable and efficient object detection ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787.
43	SUNKARA R， LUO T. No more strided convolutions or pooling： a new CNN building block for low-resolution images and small objects ［C］// Proceedings of the 2022 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham： Springer， 2022： 443-459.

模型	d	AP₅₀/%	AP/%	AP_S/%	AP_M/%	AP_L/%
SSD	300	43.1	25.1	6.6	22.6	35.5
DSSD	321	53.3	33.2	13.0	35.4	51.1
YOLOv3	416	44.0	21.6	5.0	22.4	35.5
RetinaNet	~500	55.7	34.7	18.3	38.2	47.1
OHEM++	~600	45.9	25.5	7.4	27.7	40.3
CoupleNet	600	54.8	34.3	13.4	38.1	52.0
YOLOv5	640	56.8	37.4	—	—	—
YOLOv7	640	56.7	41.7	18.8	42.4	51.9
YOLOv8	640	61.0	44.5	25.3	45.8	56.4
ViDT	—	59.6	43.3	23.2	42.5	55.8
本文模型	640	61.4	44.1	25.9	44.6	55.6

模型	d	AP₅₀/%	AP/%	AP_S/%	AP_M/%	AP_L/%
SSD	300	43.1	25.1	6.6	22.6	35.5
DSSD	321	53.3	33.2	13.0	35.4	51.1
YOLOv3	416	44.0	21.6	5.0	22.4	35.5
RetinaNet	~500	55.7	34.7	18.3	38.2	47.1
OHEM++	~600	45.9	25.5	7.4	27.7	40.3
CoupleNet	600	54.8	34.3	13.4	38.1	52.0
YOLOv5	640	56.8	37.4	—	—	—
YOLOv7	640	56.7	41.7	18.8	42.4	51.9
YOLOv8	640	61.0	44.5	25.3	45.8	56.4
ViDT	—	59.6	43.3	23.2	42.5	55.8
本文模型	640	61.4	44.1	25.9	44.6	55.6

模型	P/%	R/%	AP₅₀/%	AP/%	帧率/（frame·s^-1）
Faster R-CNN	73.2	55.2	73.2	44.0	7.0
SSD	76.8	59.4	76.8	45.6	44.3
YOLOv3	77.2	52.5	77.2	39.8	74.0
YOLOv4	80.4	62.3	72.7	46.1	54.0
YOLOv5	79.9	77.9	83.1	59.4	90.9
YOLOv7	84.9	75.1	86.1	60.1	92.0
本文模型	83.7	81.0	87.8	63.9	78.2

模型	P/%	R/%	AP₅₀/%	AP/%	帧率/（frame·s^-1）
Faster R-CNN	73.2	55.2	73.2	44.0	7.0
SSD	76.8	59.4	76.8	45.6	44.3
YOLOv3	77.2	52.5	77.2	39.8	74.0
YOLOv4	80.4	62.3	72.7	46.1	54.0
YOLOv5	79.9	77.9	83.1	59.4	90.9
YOLOv7	84.9	75.1	86.1	60.1	92.0
本文模型	83.7	81.0	87.8	63.9	78.2

模型	主干网络	AP	AP₅₀	AP₇₅
RRNet	ResNet-50	32.9	55.8	31.3
GLSAN	ResNet-50	30.7	54.3	30.0
CascadeNet	ResNet-50	30.1	58.0	27.5
TridentNet	ResNet-101	22.5	43.3	20.5
MPFPN	ResNet-101	29.1	54.4	27.0
ClusDet	ResNeXt-101	32.4	56.2	31.6
QueryDet	ResNeXt-101	33.9	56.1	34.9
SAIC-FPN	ResNeXt-101	35.7	63.0	35.1
本文模型	CSPDarkNet	34.3	61.8	33.2

Optimization model for small object detection based on multi-level feature bidirectional fusion

基于多级特征双向融合的小目标检测优化模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 43

Related Articles 15

Recommended Articles

Metrics

SC	MFN	AFF	AP₅₀/%	参数量/10⁶	计算量/GFLOPs	帧率/ （frame·s^-1）
			56.8	7.0	16.0	101.0
√			58.3	13.5	21.2	96.2
	√		59.8	12.5	24.5	80.7
		√	57.8	7.3	17.0	95.3
√	√		58.2	13.6	22.2	76.4
	√	√	60.2	12.7	25.5	75.2
√	√	√	61.4	19.0	30.6	78.2

Neck	AP₅₀/%	AP_S/%	参数量
FPN+PANet	56.8	18.8	1.00
NAS-FPN	53.2	—	0.73
BiFPN	55.5	—	0.69
SPD	59.1	21.9	2.18
MFN	61.4	25.9	1.24

[1]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[2]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[3]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[4]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[5]	Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380.
[6]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[7]	Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625.
[8]	Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650.
[9]	Yingjun ZHANG, Niuniu LI, Binhong XIE, Rui ZHANG, Wangdong LU. Semi-supervised object detection framework guided by curriculum learning [J]. Journal of Computer Applications, 2024, 44(8): 2326-2333.
[10]	Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587.
[11]	Sailong SHI, Zhiwen FANG. Gaze estimation model based on multi-scale aggregation and shared attention [J]. Journal of Computer Applications, 2024, 44(7): 2047-2054.
[12]	Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318.
[13]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[14]	Wudan LONG, Bo PENG, Jie HU, Ying SHEN, Danni DING. Road damage detection algorithm based on enhanced feature extraction [J]. Journal of Computer Applications, 2024, 44(7): 2264-2270.
[15]	Ruihua LIU, Zihe HAO, Yangyang ZOU. Gait recognition algorithm based on multi-layer refined feature fusion [J]. Journal of Computer Applications, 2024, 44(7): 2250-2257.

模块	AP₅₀/%	计算量/GFLOPs	参数量/10⁶
SPP	83.1	16.5	7.22
SPPF	83.4	16.5	7.23
CSPSPPFC	84.7	21.2	13.50

模块	AP₅₀/%	计算量/GFLOPs	参数量/10⁶
SPP	83.1	16.5	7.22
SPPF	83.4	16.5	7.23
CSPSPPFC	84.7	21.2	13.50