《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (9): 2871-2877.DOI: 10.11772/j.issn.1001-9081.2023091274
收稿日期:
2023-09-18
修回日期:
2023-11-28
接受日期:
2023-12-01
发布日期:
2024-03-15
出版日期:
2024-09-10
通讯作者:
杨哲
作者简介:
潘烨新(1999—),男,江苏苏州人,硕士研究生,CCF会员,主要研究方向:计算机视觉、深度学习;
基金资助:
Received:
2023-09-18
Revised:
2023-11-28
Accepted:
2023-12-01
Online:
2024-03-15
Published:
2024-09-10
Contact:
Zhe YANG
About author:
PAN Yexin, born in 1999, M. S. candidate. His research interests include computer vision, deep learning.
Supported by:
摘要:
由于自身特征较小以及网络的深度造成特征丢失等客观原因,小目标的检测一直是目标检测领域的难点问题。针对以上问题,提出基于网络结构进行多次特征增强以优化小目标检测的模型。首先,替换主干网络中的空间金字塔池化(SPP)以优化梯度计算;其次,对网络颈部实行区分特征级别的多级双向融合,并对输出头添加自适应特征融合(AFF)模块,以实现多级的特征增强。实验结果表明,在COCO2017-val数据集上,当交并比(IoU)为0.5时,所提模型的平均精度均值达到61.4%,与目前较流行的YOLOv7模型相比提高了4.7个百分点,同时在单GPU上模型的检测帧率为78.2 frame/s,满足工业检测速度要求。
中图分类号:
潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 计算机应用, 2024, 44(9): 2871-2877.
Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion[J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
模型 | d | AP50/% | AP/% | APS/% | APM/% | APL/% |
---|---|---|---|---|---|---|
SSD | 300 | 43.1 | 25.1 | 6.6 | 22.6 | 35.5 |
DSSD | 321 | 53.3 | 33.2 | 13.0 | 35.4 | 51.1 |
YOLOv3 | 416 | 44.0 | 21.6 | 5.0 | 22.4 | 35.5 |
RetinaNet | ~500 | 55.7 | 34.7 | 18.3 | 38.2 | 47.1 |
OHEM++ | ~600 | 45.9 | 25.5 | 7.4 | 27.7 | 40.3 |
CoupleNet | 600 | 54.8 | 34.3 | 13.4 | 38.1 | 52.0 |
YOLOv5 | 640 | 56.8 | 37.4 | — | — | — |
YOLOv7 | 640 | 56.7 | 41.7 | 18.8 | 42.4 | 51.9 |
YOLOv8 | 640 | 61.0 | 44.5 | 25.3 | 45.8 | 56.4 |
ViDT | — | 59.6 | 43.3 | 23.2 | 42.5 | 55.8 |
本文模型 | 640 | 61.4 | 44.1 | 25.9 | 44.6 | 55.6 |
表1 不同模型在COCO2017-val数据集上的结果对比
Tab. 1 Results comparison of different models on COCO2017-val dataset
模型 | d | AP50/% | AP/% | APS/% | APM/% | APL/% |
---|---|---|---|---|---|---|
SSD | 300 | 43.1 | 25.1 | 6.6 | 22.6 | 35.5 |
DSSD | 321 | 53.3 | 33.2 | 13.0 | 35.4 | 51.1 |
YOLOv3 | 416 | 44.0 | 21.6 | 5.0 | 22.4 | 35.5 |
RetinaNet | ~500 | 55.7 | 34.7 | 18.3 | 38.2 | 47.1 |
OHEM++ | ~600 | 45.9 | 25.5 | 7.4 | 27.7 | 40.3 |
CoupleNet | 600 | 54.8 | 34.3 | 13.4 | 38.1 | 52.0 |
YOLOv5 | 640 | 56.8 | 37.4 | — | — | — |
YOLOv7 | 640 | 56.7 | 41.7 | 18.8 | 42.4 | 51.9 |
YOLOv8 | 640 | 61.0 | 44.5 | 25.3 | 45.8 | 56.4 |
ViDT | — | 59.6 | 43.3 | 23.2 | 42.5 | 55.8 |
本文模型 | 640 | 61.4 | 44.1 | 25.9 | 44.6 | 55.6 |
模型 | P/% | R/% | AP50/% | AP/% | 帧率/(frame·s-1) |
---|---|---|---|---|---|
Faster R-CNN | 73.2 | 55.2 | 73.2 | 44.0 | 7.0 |
SSD | 76.8 | 59.4 | 76.8 | 45.6 | 44.3 |
YOLOv3 | 77.2 | 52.5 | 77.2 | 39.8 | 74.0 |
YOLOv4 | 80.4 | 62.3 | 72.7 | 46.1 | 54.0 |
YOLOv5 | 79.9 | 77.9 | 83.1 | 59.4 | 90.9 |
YOLOv7 | 84.9 | 75.1 | 86.1 | 60.1 | 92.0 |
本文模型 | 83.7 | 81.0 | 87.8 | 63.9 | 78.2 |
表2 不同模型在PASCAL VOC数据集上的对比结果
Tab. 2 Results comparison of different models on PASCAL VOC dataset
模型 | P/% | R/% | AP50/% | AP/% | 帧率/(frame·s-1) |
---|---|---|---|---|---|
Faster R-CNN | 73.2 | 55.2 | 73.2 | 44.0 | 7.0 |
SSD | 76.8 | 59.4 | 76.8 | 45.6 | 44.3 |
YOLOv3 | 77.2 | 52.5 | 77.2 | 39.8 | 74.0 |
YOLOv4 | 80.4 | 62.3 | 72.7 | 46.1 | 54.0 |
YOLOv5 | 79.9 | 77.9 | 83.1 | 59.4 | 90.9 |
YOLOv7 | 84.9 | 75.1 | 86.1 | 60.1 | 92.0 |
本文模型 | 83.7 | 81.0 | 87.8 | 63.9 | 78.2 |
模型 | 主干网络 | AP | AP50 | AP75 |
---|---|---|---|---|
RRNet | ResNet-50 | 32.9 | 55.8 | 31.3 |
GLSAN | ResNet-50 | 30.7 | 54.3 | 30.0 |
CascadeNet | ResNet-50 | 30.1 | 58.0 | 27.5 |
TridentNet | ResNet-101 | 22.5 | 43.3 | 20.5 |
MPFPN | ResNet-101 | 29.1 | 54.4 | 27.0 |
ClusDet | ResNeXt-101 | 32.4 | 56.2 | 31.6 |
QueryDet | ResNeXt-101 | 33.9 | 56.1 | 34.9 |
SAIC-FPN | ResNeXt-101 | 35.7 | 63.0 | 35.1 |
本文模型 | CSPDarkNet | 34.3 | 61.8 | 33.2 |
表3 不同模型在VisDrone2019数据集上的结果对比 (%)
Tab. 3 Results comparison of different models on VisDrone2019 dataset
模型 | 主干网络 | AP | AP50 | AP75 |
---|---|---|---|---|
RRNet | ResNet-50 | 32.9 | 55.8 | 31.3 |
GLSAN | ResNet-50 | 30.7 | 54.3 | 30.0 |
CascadeNet | ResNet-50 | 30.1 | 58.0 | 27.5 |
TridentNet | ResNet-101 | 22.5 | 43.3 | 20.5 |
MPFPN | ResNet-101 | 29.1 | 54.4 | 27.0 |
ClusDet | ResNeXt-101 | 32.4 | 56.2 | 31.6 |
QueryDet | ResNeXt-101 | 33.9 | 56.1 | 34.9 |
SAIC-FPN | ResNeXt-101 | 35.7 | 63.0 | 35.1 |
本文模型 | CSPDarkNet | 34.3 | 61.8 | 33.2 |
模块 | AP50/% | 计算量/GFLOPs | 参数量/106 |
---|---|---|---|
SPP | 83.1 | 16.5 | 7.22 |
SPPF | 83.4 | 16.5 | 7.23 |
CSPSPPFC | 84.7 | 21.2 | 13.50 |
表4 不同空间金字塔池化在PASCAL VOC数据集上的实验结果
Tab. 4 Experimental results of different spatial pyramid poolings on PASCAL VOC dataset
模块 | AP50/% | 计算量/GFLOPs | 参数量/106 |
---|---|---|---|
SPP | 83.1 | 16.5 | 7.22 |
SPPF | 83.4 | 16.5 | 7.23 |
CSPSPPFC | 84.7 | 21.2 | 13.50 |
SC | MFN | AFF | AP50/% | 参数量/106 | 计算量/GFLOPs | 帧率/ (frame·s-1) |
---|---|---|---|---|---|---|
56.8 | 7.0 | 16.0 | 101.0 | |||
√ | 58.3 | 13.5 | 21.2 | 96.2 | ||
√ | 59.8 | 12.5 | 24.5 | 80.7 | ||
√ | 57.8 | 7.3 | 17.0 | 95.3 | ||
√ | √ | 58.2 | 13.6 | 22.2 | 76.4 | |
√ | √ | 60.2 | 12.7 | 25.5 | 75.2 | |
√ | √ | √ | 61.4 | 19.0 | 30.6 | 78.2 |
表5 不同增强策略在COCO2017-val数据集上的对比结果
Tab. 5 Comparison results of different enhancement strategies on COCO2017-val dataset
SC | MFN | AFF | AP50/% | 参数量/106 | 计算量/GFLOPs | 帧率/ (frame·s-1) |
---|---|---|---|---|---|---|
56.8 | 7.0 | 16.0 | 101.0 | |||
√ | 58.3 | 13.5 | 21.2 | 96.2 | ||
√ | 59.8 | 12.5 | 24.5 | 80.7 | ||
√ | 57.8 | 7.3 | 17.0 | 95.3 | ||
√ | √ | 58.2 | 13.6 | 22.2 | 76.4 | |
√ | √ | 60.2 | 12.7 | 25.5 | 75.2 | |
√ | √ | √ | 61.4 | 19.0 | 30.6 | 78.2 |
Neck | AP50/% | APS/% | 参数量 |
---|---|---|---|
FPN+PANet | 56.8 | 18.8 | 1.00 |
NAS-FPN | 53.2 | — | 0.73 |
BiFPN | 55.5 | — | 0.69 |
SPD | 59.1 | 21.9 | 2.18 |
MFN | 61.4 | 25.9 | 1.24 |
表6 不同特征增强网络的性能对比
Tab. 6 Performance comparison of different feature enhancement networks
Neck | AP50/% | APS/% | 参数量 |
---|---|---|---|
FPN+PANet | 56.8 | 18.8 | 1.00 |
NAS-FPN | 53.2 | — | 0.73 |
BiFPN | 55.5 | — | 0.69 |
SPD | 59.1 | 21.9 | 2.18 |
MFN | 61.4 | 25.9 | 1.24 |
1 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. |
2 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37. |
3 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. |
4 | MA W, WU Y, CEN F, et al. MDFN: multi-scale deep feature learning network for object detection [J]. Pattern Recognition, 2020, 100: 107149. |
5 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. |
6 | 王建军, 魏江, 梅少辉, 等. 面向遥感图像小目标检测的改进YOLOv3算法 [J]. 计算机工程与应用, 2021, 57(20): 133-141. |
WANG J J, WEI J, MEI S H, et al. Improved YOLOv3 for small object detection in remote sensing images [J]. Computer Engineering and Applications, 2021, 57(20): 133-141. | |
7 | 陈欣, 万敏杰, 马超, 等. 采用多尺度特征融合SSD的遥感图像小目标检测 [J]. 光学精密工程, 2021, 29(11): 2672-2682. |
CHEN X, WAN M J, MA C, et al. Recognition of small targets in remote sensing image using multi-scale feature fusion-based shot multi-box detector[J]. Optics and Precision Engineering, 2021, 29(11) : 2672-2682. | |
8 | ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on Transformer prediction head for object detection on drone-captured scenarios [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2021: 2778-2788. |
9 | 蒋镕圻, 彭月平, 谢文宣, 等. 嵌入scSE模块的改进YOLOv4小目标检测算法[J]. 图学学报, 2021, 42(4): 546-555. |
JIANG R X, PENG Y P, XIE W X, et al. Improved YOLOv4 small target detection algorithm with embedded scSE module [J]. Journal of Graphics, 2021, 42(4): 546-555. | |
10 | ZHAO H, ZHANG H, ZHAO Y. YOLOv7-sea: object detection of maritime UAV images based on improved YOLOv7 [C]// Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2023: 233-238. |
11 | GE Z, LIU S, WANG F, et al. YOLOX: exceeding yolo series in 2021 [EB/OL]. [2022-03-22]. . |
12 | YANG C, HANG Z, WANG N. QueryDet: cascaded sparse query for accelerating high-resolution small object detection [C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 13658-13667. |
13 | 赵鹏飞, 谢林柏, 彭力. 融合注意力机制的深层次小目标检测算法 [J]. 计算机科学与探索, 2022, 16(4): 927-937. |
ZHAO P F, XIE L B, PENG L. Deep small object detection algorithm integrating attention mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 927-937. | |
14 | ZHANG Z, LU X, CAO G, et al. ViT-YOLO: Transformer-based YOLO for object detection [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2021: 2799-2808. |
15 | WU S, YU F, YU X, et al. TFNet: multi-semantic feature interaction for CTR prediction [C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 1885-1888. |
16 | LI X, WANG W, WU L, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 21002-21012. |
17 | PANG J, CHEN K, SHI J, et al. Libra R-CNN: towards balanced learning for object detection [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 821-830. |
18 | LIANG T, WANG Y, TANG Z, et al. OPANAS: one-shot path aggregation network architecture search for object detection [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10190-10198. |
19 | ZENG N, WU P, WANG Z, et al. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection [J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 3507014. |
20 | ZHANG W, FU C, XIE H, et al. Global context aware RCNN for object detection [J]. Neural Computing and Applications, 2021, 33(18): 11627-11639. |
21 | WANG C-Y, BOCHKOVSKIY A, LIAO H-Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475. |
22 | BOCHKOVSKIY A, WANG C-Y, LIAO H-Y M. YOLOv4: optimal speed and accuracy of object detection [EB/OL]. [2023-02-14]. . |
23 | TANG P, WANG X, WANG A, et al. Weakly supervised region proposal network and object detection [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 370-386. |
24 | LIN T-Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. |
25 | MA J. RRPN++: guidance towards more accurate scene text detection [EB/OL]. [2023-05-29]. . |
26 | SINGH B, NAJIBI M, DAVIS L S. SNIPER: efficient multi-scale training [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9333-9343. |
27 | HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. |
28 | LIN T-Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944. |
29 | DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway: IEEE, 2019: 213-226. |
30 | FU C-Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [EB/OL]. [2023-05-29]. . |
31 | REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. [2022-10-24]. . |
32 | SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 761-769. |
33 | ZHU Y, ZHAO C, WANG J, et al. CoupleNet: coupling global structure with local parts for object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 4146-4154. |
34 | SONG H, SUN D, CHUN S, et al. ViDT: an efficient and effective fully Transformer-based object detector [EB/OL]. [2023-08-13]. . |
35 | CHEN C, ZHANG Y, LV Q, et al. RRNet: a hybrid detector for object detection in drone-captured images [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway: IEEE, 2019: 100-108. |
36 | HUANG Q, ZHAO C, JIANG M, et al. Cascade-Net: a new deep learning architecture for OFDM detection [EB/OL]. [2023-08-24]. . |
37 | ZHOU J, C-M VONG, LIU Q, et al. Scale adaptive image cropping for UAV object detection [J]. Neurocomputing, 2019, 366: 305-313. |
38 | DENG S, LI S, XIE K, et al. A global-local self-adaptive network for drone-view object detection [J]. IEEE Transactions on Image Processing, 2020, 30: 1556-1569. |
39 | LI Y, CHEN Y, WANG N, et al. Scale-aware trident networks for object detection [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6054-6063. |
40 | LIU Y, YANG F, HU P. Small-object detection in UAV-captured images via multi-branch parallel feature pyramid networks [J]. IEEE Access, 2020, 8: 145740-145750. |
41 | GHIASI G, LIN T-Y, LE Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 7029-7038. |
42 | TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787. |
43 | SUNKARA R, LUO T. No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects [C]// Proceedings of the 2022 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2022: 443-459. |
[1] | 黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969. |
[2] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[3] | 王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918. |
[4] | 李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703. |
[5] | 付帅, 郭小英, 白茹意, 闫涛, 陈斌. 改进的CloFormer模型与有序回归相结合的年龄评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2372-2380. |
[6] | 刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557. |
[7] | 顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625. |
[8] | 石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650. |
[9] | 李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587. |
[10] | 张英俊, 李牛牛, 谢斌红, 张睿, 陆望东. 课程学习指导下的半监督目标检测框架[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2326-2333. |
[11] | 施赛龙, 方智文. 基于多尺度聚合和共享注意力的注视估计模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2047-2054. |
[12] | 赵亦群, 张志禹, 董雪. 基于密集残差物理信息神经网络的各向异性旅行时计算方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2310-2318. |
[13] | 徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199. |
[14] | 龙伍丹, 彭博, 胡节, 申颖, 丁丹妮. 基于加强特征提取的道路病害检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2264-2270. |
[15] | 刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||