基于多级特征双向融合的小目标检测优化模型

doi:10.11772/j.issn.1001-9081.2023091274

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (9): 2871-2877.DOI: 10.11772/j.issn.1001-9081.2023091274

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于多级特征双向融合的小目标检测优化模型

潘烨新¹^,², 杨哲¹^,²()

^1.苏州大学计算机科学与技术学院，江苏苏州 215006
^2.江苏省计算机信息处理技术重点实验室（苏州大学），江苏苏州 215006

收稿日期:2023-09-18 修回日期:2023-11-28 接受日期:2023-12-01 发布日期:2024-03-15 出版日期:2024-09-10
通讯作者: 杨哲
作者简介:潘烨新（1999—），男，江苏苏州人，硕士研究生，CCF会员，主要研究方向：计算机视觉、深度学习；
基金资助:
国家自然科学基金资助项目(62002253);教育部产学合作协同育人项目(220606363154256);国家级大学生创新创业训练计划项目(202210285042Z)

Optimization model for small object detection based on multi-level feature bidirectional fusion

Yexin PAN¹^,², Zhe YANG¹^,²()

^1.School of Computer Science & Technology，Soochow University，Suzhou Jiangsu 215006，China
^2.Jiangsu Provincial Key Laboratory for Computer Information Processing Technology （Soochow University），Suzhou Jiangsu 215006，China

Received:2023-09-18 Revised:2023-11-28 Accepted:2023-12-01 Online:2024-03-15 Published:2024-09-10
Contact: Zhe YANG
About author:PAN Yexin， born in 1999， M. S. candidate. His research interests include computer vision， deep learning.
Supported by:
National Natural Science Foundation of China(62002253);Collaborative Education Program on Industry and Education of Ministry of Education(220606363154256);National College Student Innovation and Entrepreneurship Training Program Project(202210285042Z)

摘要/Abstract

摘要：

由于自身特征较小以及网络的深度造成特征丢失等客观原因，小目标的检测一直是目标检测领域的难点问题。针对以上问题，提出基于网络结构进行多次特征增强以优化小目标检测的模型。首先，替换主干网络中的空间金字塔池化（SPP）以优化梯度计算；其次，对网络颈部实行区分特征级别的多级双向融合，并对输出头添加自适应特征融合（AFF）模块，以实现多级的特征增强。实验结果表明，在COCO2017-val数据集上，当交并比（IoU）为0.5时，所提模型的平均精度均值达到61.4%，与目前较流行的YOLOv7模型相比提高了4.7个百分点，同时在单GPU上模型的检测帧率为78.2 frame/s，满足工业检测速度要求。

关键词: 深度学习, 小目标, 目标检测, 计算机视觉, 特征融合

Abstract:

Due to objective factors such as small inherent features and the depth of the network causing feature loss， the detection of small objects is always a challenging issue in the field of object detection. To address the above issues， a model for optimizing the detection of small objects was proposed based on multiple feature enhancements based on the network structure. Firstly， the optimization of gradient calculation was achieved by replacing Spatial Pyramid Pooling （SPP） in the backbone network. Secondly， a multi-level bidirectional fusion at the feature level and the addition of Adaptive Feature Fusion （AFF） module to the output head were employed in the network neck to achieve multi-level feature enhancement. Experimental results show that on COCO2017-val dataset， when the IoU （Intersection over Union） is 0.5， the average precision of the proposed model reaches 61.4%， which is 4.7 percentage points higher than that of the currently popular YOLOv7 model. At the same time， the detection frame rate of the proposed model with a single GPU is 78.2 frame/s， which is in line with industrial level detection speed.

Key words: deep learning, small object, object detection, computer vision, feature fusion

中图分类号:

TP391

潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 计算机应用, 2024, 44(9): 2871-2877.

Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion[J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.

图/表 10

参考文献 43

1	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
2	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector ［C］// Proceedings of the 14th European Conference on Computer Vision. Cham： Springer， 2016： 21-37.
3	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788.
4	MA W， WU Y， CEN F， et al. MDFN： multi-scale deep feature learning network for object detection ［J］. Pattern Recognition， 2020， 100： 107149.
5	REDMON J， FARHADI A. YOLO9000： better， faster， stronger ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525.
6	王建军，魏江，梅少辉，等. 面向遥感图像小目标检测的改进YOLOv3算法［J］. 计算机工程与应用， 2021， 57（20）： 133-141.
	WANG J J， WEI J， MEI S H， et al. Improved YOLOv3 for small object detection in remote sensing images ［J］. Computer Engineering and Applications， 2021， 57（20）： 133-141.
7	陈欣，万敏杰，马超，等. 采用多尺度特征融合SSD的遥感图像小目标检测［J］. 光学精密工程， 2021， 29（11）： 2672-2682.
	CHEN X， WAN M J， MA C， et al. Recognition of small targets in remote sensing image using multi-scale feature fusion-based shot multi-box detector［J］. Optics and Precision Engineering， 2021， 29（11）： 2672-2682.
8	ZHU X， LYU S， WANG X， et al. TPH-YOLOv5： improved YOLOv5 based on Transformer prediction head for object detection on drone-captured scenarios ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway： IEEE， 2021： 2778-2788.
9	蒋镕圻，彭月平，谢文宣，等. 嵌入scSE模块的改进YOLOv4小目标检测算法［J］. 图学学报， 2021， 42（4）： 546-555.
	JIANG R X， PENG Y P， XIE W X， et al. Improved YOLOv4 small target detection algorithm with embedded scSE module ［J］. Journal of Graphics， 2021， 42（4）： 546-555.
10	ZHAO H， ZHANG H， ZHAO Y. YOLOv7-sea： object detection of maritime UAV images based on improved YOLOv7 ［C］// Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2023： 233-238.
11	GE Z， LIU S， WANG F， et al. YOLOX： exceeding yolo series in 2021 ［EB/OL］. ［2022-03-22］. .
12	YANG C， HANG Z， WANG N. QueryDet： cascaded sparse query for accelerating high-resolution small object detection ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 13658-13667.
13	赵鹏飞，谢林柏，彭力. 融合注意力机制的深层次小目标检测算法［J］. 计算机科学与探索， 2022， 16（4）： 927-937.
	ZHAO P F， XIE L B， PENG L. Deep small object detection algorithm integrating attention mechanism ［J］. Journal of Frontiers of Computer Science and Technology， 2022， 16（4）： 927-937.
14	ZHANG Z， LU X， CAO G， et al. ViT-YOLO： Transformer-based YOLO for object detection ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway： IEEE， 2021： 2799-2808.
15	WU S， YU F， YU X， et al. TFNet： multi-semantic feature interaction for CTR prediction ［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 1885-1888.
16	LI X， WANG W， WU L， et al. Generalized focal loss： learning qualified and distributed bounding boxes for dense object detection［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 21002-21012.
17	PANG J， CHEN K， SHI J， et al. Libra R-CNN： towards balanced learning for object detection ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 821-830.
18	LIANG T， WANG Y， TANG Z， et al. OPANAS： one-shot path aggregation network architecture search for object detection ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10190-10198.
19	ZENG N， WU P， WANG Z， et al. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection ［J］. IEEE Transactions on Instrumentation and Measurement， 2022， 71： 3507014.
20	ZHANG W， FU C， XIE H， et al. Global context aware RCNN for object detection ［J］. Neural Computing and Applications， 2021， 33（18）： 11627-11639.
21	WANG C-Y， BOCHKOVSKIY A， LIAO H-Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors ［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 7464-7475.
22	BOCHKOVSKIY A， WANG C-Y， LIAO H-Y M. YOLOv4： optimal speed and accuracy of object detection ［EB/OL］. ［2023-02-14］. .
23	TANG P， WANG X， WANG A， et al. Weakly supervised region proposal network and object detection ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 370-386.
24	LIN T-Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007.
25	MA J. RRPN++： guidance towards more accurate scene text detection ［EB/OL］. ［2023-05-29］. .
26	SINGH B， NAJIBI M， DAVIS L S. SNIPER： efficient multi-scale training ［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2018： 9333-9343.
27	HE K， ZHANG X， REN S， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916.
28	LIN T-Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944.
29	DU D， ZHU P， WEN L， et al. VisDrone-DET2019： the vision meets drone object detection in image challenge results ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway： IEEE， 2019： 213-226.
30	FU C-Y， LIU W， RANGA A， et al. DSSD： deconvolutional single shot detector ［EB/OL］. ［2023-05-29］. .
31	REDMON J， FARHADI A. YOLOv3： an incremental improvement ［EB/OL］. ［2022-10-24］. .
32	SHRIVASTAVA A， GUPTA A， GIRSHICK R. Training region-based object detectors with online hard example mining ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 761-769.
33	ZHU Y， ZHAO C， WANG J， et al. CoupleNet： coupling global structure with local parts for object detection ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 4146-4154.
34	SONG H， SUN D， CHUN S， et al. ViDT： an efficient and effective fully Transformer-based object detector ［EB/OL］. ［2023-08-13］. .
35	CHEN C， ZHANG Y， LV Q， et al. RRNet： a hybrid detector for object detection in drone-captured images ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway： IEEE， 2019： 100-108.
36	HUANG Q， ZHAO C， JIANG M， et al. Cascade-Net： a new deep learning architecture for OFDM detection ［EB/OL］. ［2023-08-24］. .
37	ZHOU J， C-M VONG， LIU Q， et al. Scale adaptive image cropping for UAV object detection ［J］. Neurocomputing， 2019， 366： 305-313.
38	DENG S， LI S， XIE K， et al. A global-local self-adaptive network for drone-view object detection ［J］. IEEE Transactions on Image Processing， 2020， 30： 1556-1569.
39	LI Y， CHEN Y， WANG N， et al. Scale-aware trident networks for object detection ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6054-6063.
40	LIU Y， YANG F， HU P. Small-object detection in UAV-captured images via multi-branch parallel feature pyramid networks ［J］. IEEE Access， 2020， 8： 145740-145750.
41	GHIASI G， LIN T-Y， LE Q V. NAS-FPN： learning scalable feature pyramid architecture for object detection ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7029-7038.
42	TAN M， PANG R， LE Q V. EfficientDet： scalable and efficient object detection ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787.
43	SUNKARA R， LUO T. No more strided convolutions or pooling： a new CNN building block for low-resolution images and small objects ［C］// Proceedings of the 2022 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham： Springer， 2022： 443-459.

模型	d	AP₅₀/%	AP/%	AP_S/%	AP_M/%	AP_L/%
SSD	300	43.1	25.1	6.6	22.6	35.5
DSSD	321	53.3	33.2	13.0	35.4	51.1
YOLOv3	416	44.0	21.6	5.0	22.4	35.5
RetinaNet	~500	55.7	34.7	18.3	38.2	47.1
OHEM++	~600	45.9	25.5	7.4	27.7	40.3
CoupleNet	600	54.8	34.3	13.4	38.1	52.0
YOLOv5	640	56.8	37.4	—	—	—
YOLOv7	640	56.7	41.7	18.8	42.4	51.9
YOLOv8	640	61.0	44.5	25.3	45.8	56.4
ViDT	—	59.6	43.3	23.2	42.5	55.8
本文模型	640	61.4	44.1	25.9	44.6	55.6

模型	d	AP₅₀/%	AP/%	AP_S/%	AP_M/%	AP_L/%
SSD	300	43.1	25.1	6.6	22.6	35.5
DSSD	321	53.3	33.2	13.0	35.4	51.1
YOLOv3	416	44.0	21.6	5.0	22.4	35.5
RetinaNet	~500	55.7	34.7	18.3	38.2	47.1
OHEM++	~600	45.9	25.5	7.4	27.7	40.3
CoupleNet	600	54.8	34.3	13.4	38.1	52.0
YOLOv5	640	56.8	37.4	—	—	—
YOLOv7	640	56.7	41.7	18.8	42.4	51.9
YOLOv8	640	61.0	44.5	25.3	45.8	56.4
ViDT	—	59.6	43.3	23.2	42.5	55.8
本文模型	640	61.4	44.1	25.9	44.6	55.6

模型	P/%	R/%	AP₅₀/%	AP/%	帧率/（frame·s^-1）
Faster R-CNN	73.2	55.2	73.2	44.0	7.0
SSD	76.8	59.4	76.8	45.6	44.3
YOLOv3	77.2	52.5	77.2	39.8	74.0
YOLOv4	80.4	62.3	72.7	46.1	54.0
YOLOv5	79.9	77.9	83.1	59.4	90.9
YOLOv7	84.9	75.1	86.1	60.1	92.0
本文模型	83.7	81.0	87.8	63.9	78.2

模型	P/%	R/%	AP₅₀/%	AP/%	帧率/（frame·s^-1）
Faster R-CNN	73.2	55.2	73.2	44.0	7.0
SSD	76.8	59.4	76.8	45.6	44.3
YOLOv3	77.2	52.5	77.2	39.8	74.0
YOLOv4	80.4	62.3	72.7	46.1	54.0
YOLOv5	79.9	77.9	83.1	59.4	90.9
YOLOv7	84.9	75.1	86.1	60.1	92.0
本文模型	83.7	81.0	87.8	63.9	78.2

模型	主干网络	AP	AP₅₀	AP₇₅
RRNet	ResNet-50	32.9	55.8	31.3
GLSAN	ResNet-50	30.7	54.3	30.0
CascadeNet	ResNet-50	30.1	58.0	27.5
TridentNet	ResNet-101	22.5	43.3	20.5
MPFPN	ResNet-101	29.1	54.4	27.0
ClusDet	ResNeXt-101	32.4	56.2	31.6
QueryDet	ResNeXt-101	33.9	56.1	34.9
SAIC-FPN	ResNeXt-101	35.7	63.0	35.1
本文模型	CSPDarkNet	34.3	61.8	33.2

基于多级特征双向融合的小目标检测优化模型

Optimization model for small object detection based on multi-level feature bidirectional fusion

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 43

相关文章 15

编辑推荐

Metrics

SC	MFN	AFF	AP₅₀/%	参数量/10⁶	计算量/GFLOPs	帧率/ （frame·s^-1）
			56.8	7.0	16.0	101.0
√			58.3	13.5	21.2	96.2
	√		59.8	12.5	24.5	80.7
		√	57.8	7.3	17.0	95.3
√	√		58.2	13.6	22.2	76.4
	√	√	60.2	12.7	25.5	75.2
√	√	√	61.4	19.0	30.6	78.2

Neck	AP₅₀/%	AP_S/%	参数量
FPN+PANet	56.8	18.8	1.00
NAS-FPN	53.2	—	0.73
BiFPN	55.5	—	0.69
SPD	59.1	21.9	2.18
MFN	61.4	25.9	1.24

[1]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[2]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[3]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[4]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[5]	付帅, 郭小英, 白茹意, 闫涛, 陈斌. 改进的CloFormer模型与有序回归相结合的年龄评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2372-2380.
[6]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[7]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[8]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[9]	李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587.
[10]	张英俊, 李牛牛, 谢斌红, 张睿, 陆望东. 课程学习指导下的半监督目标检测框架[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2326-2333.
[11]	施赛龙, 方智文. 基于多尺度聚合和共享注意力的注视估计模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2047-2054.
[12]	赵亦群, 张志禹, 董雪. 基于密集残差物理信息神经网络的各向异性旅行时计算方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2310-2318.
[13]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[14]	龙伍丹, 彭博, 胡节, 申颖, 丁丹妮. 基于加强特征提取的道路病害检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2264-2270.
[15]	刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257.

模块	AP₅₀/%	计算量/GFLOPs	参数量/10⁶
SPP	83.1	16.5	7.22
SPPF	83.4	16.5	7.23
CSPSPPFC	84.7	21.2	13.50

模块	AP₅₀/%	计算量/GFLOPs	参数量/10⁶
SPP	83.1	16.5	7.22
SPPF	83.4	16.5	7.23
CSPSPPFC	84.7	21.2	13.50