Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (9): 2871-2877.DOI: 10.11772/j.issn.1001-9081.2023091274
• Multimedia computing and computer simulation • Previous Articles Next Articles
Received:
2023-09-18
Revised:
2023-11-28
Accepted:
2023-12-01
Online:
2024-03-15
Published:
2024-09-10
Contact:
Zhe YANG
About author:
PAN Yexin, born in 1999, M. S. candidate. His research interests include computer vision, deep learning.
Supported by:
通讯作者:
杨哲
作者简介:
潘烨新(1999—),男,江苏苏州人,硕士研究生,CCF会员,主要研究方向:计算机视觉、深度学习;
基金资助:
CLC Number:
Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion[J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023091274
模型 | d | AP50/% | AP/% | APS/% | APM/% | APL/% |
---|---|---|---|---|---|---|
SSD | 300 | 43.1 | 25.1 | 6.6 | 22.6 | 35.5 |
DSSD | 321 | 53.3 | 33.2 | 13.0 | 35.4 | 51.1 |
YOLOv3 | 416 | 44.0 | 21.6 | 5.0 | 22.4 | 35.5 |
RetinaNet | ~500 | 55.7 | 34.7 | 18.3 | 38.2 | 47.1 |
OHEM++ | ~600 | 45.9 | 25.5 | 7.4 | 27.7 | 40.3 |
CoupleNet | 600 | 54.8 | 34.3 | 13.4 | 38.1 | 52.0 |
YOLOv5 | 640 | 56.8 | 37.4 | — | — | — |
YOLOv7 | 640 | 56.7 | 41.7 | 18.8 | 42.4 | 51.9 |
YOLOv8 | 640 | 61.0 | 44.5 | 25.3 | 45.8 | 56.4 |
ViDT | — | 59.6 | 43.3 | 23.2 | 42.5 | 55.8 |
本文模型 | 640 | 61.4 | 44.1 | 25.9 | 44.6 | 55.6 |
Tab. 1 Results comparison of different models on COCO2017-val dataset
模型 | d | AP50/% | AP/% | APS/% | APM/% | APL/% |
---|---|---|---|---|---|---|
SSD | 300 | 43.1 | 25.1 | 6.6 | 22.6 | 35.5 |
DSSD | 321 | 53.3 | 33.2 | 13.0 | 35.4 | 51.1 |
YOLOv3 | 416 | 44.0 | 21.6 | 5.0 | 22.4 | 35.5 |
RetinaNet | ~500 | 55.7 | 34.7 | 18.3 | 38.2 | 47.1 |
OHEM++ | ~600 | 45.9 | 25.5 | 7.4 | 27.7 | 40.3 |
CoupleNet | 600 | 54.8 | 34.3 | 13.4 | 38.1 | 52.0 |
YOLOv5 | 640 | 56.8 | 37.4 | — | — | — |
YOLOv7 | 640 | 56.7 | 41.7 | 18.8 | 42.4 | 51.9 |
YOLOv8 | 640 | 61.0 | 44.5 | 25.3 | 45.8 | 56.4 |
ViDT | — | 59.6 | 43.3 | 23.2 | 42.5 | 55.8 |
本文模型 | 640 | 61.4 | 44.1 | 25.9 | 44.6 | 55.6 |
模型 | P/% | R/% | AP50/% | AP/% | 帧率/(frame·s-1) |
---|---|---|---|---|---|
Faster R-CNN | 73.2 | 55.2 | 73.2 | 44.0 | 7.0 |
SSD | 76.8 | 59.4 | 76.8 | 45.6 | 44.3 |
YOLOv3 | 77.2 | 52.5 | 77.2 | 39.8 | 74.0 |
YOLOv4 | 80.4 | 62.3 | 72.7 | 46.1 | 54.0 |
YOLOv5 | 79.9 | 77.9 | 83.1 | 59.4 | 90.9 |
YOLOv7 | 84.9 | 75.1 | 86.1 | 60.1 | 92.0 |
本文模型 | 83.7 | 81.0 | 87.8 | 63.9 | 78.2 |
Tab. 2 Results comparison of different models on PASCAL VOC dataset
模型 | P/% | R/% | AP50/% | AP/% | 帧率/(frame·s-1) |
---|---|---|---|---|---|
Faster R-CNN | 73.2 | 55.2 | 73.2 | 44.0 | 7.0 |
SSD | 76.8 | 59.4 | 76.8 | 45.6 | 44.3 |
YOLOv3 | 77.2 | 52.5 | 77.2 | 39.8 | 74.0 |
YOLOv4 | 80.4 | 62.3 | 72.7 | 46.1 | 54.0 |
YOLOv5 | 79.9 | 77.9 | 83.1 | 59.4 | 90.9 |
YOLOv7 | 84.9 | 75.1 | 86.1 | 60.1 | 92.0 |
本文模型 | 83.7 | 81.0 | 87.8 | 63.9 | 78.2 |
模型 | 主干网络 | AP | AP50 | AP75 |
---|---|---|---|---|
RRNet | ResNet-50 | 32.9 | 55.8 | 31.3 |
GLSAN | ResNet-50 | 30.7 | 54.3 | 30.0 |
CascadeNet | ResNet-50 | 30.1 | 58.0 | 27.5 |
TridentNet | ResNet-101 | 22.5 | 43.3 | 20.5 |
MPFPN | ResNet-101 | 29.1 | 54.4 | 27.0 |
ClusDet | ResNeXt-101 | 32.4 | 56.2 | 31.6 |
QueryDet | ResNeXt-101 | 33.9 | 56.1 | 34.9 |
SAIC-FPN | ResNeXt-101 | 35.7 | 63.0 | 35.1 |
本文模型 | CSPDarkNet | 34.3 | 61.8 | 33.2 |
Tab. 3 Results comparison of different models on VisDrone2019 dataset
模型 | 主干网络 | AP | AP50 | AP75 |
---|---|---|---|---|
RRNet | ResNet-50 | 32.9 | 55.8 | 31.3 |
GLSAN | ResNet-50 | 30.7 | 54.3 | 30.0 |
CascadeNet | ResNet-50 | 30.1 | 58.0 | 27.5 |
TridentNet | ResNet-101 | 22.5 | 43.3 | 20.5 |
MPFPN | ResNet-101 | 29.1 | 54.4 | 27.0 |
ClusDet | ResNeXt-101 | 32.4 | 56.2 | 31.6 |
QueryDet | ResNeXt-101 | 33.9 | 56.1 | 34.9 |
SAIC-FPN | ResNeXt-101 | 35.7 | 63.0 | 35.1 |
本文模型 | CSPDarkNet | 34.3 | 61.8 | 33.2 |
模块 | AP50/% | 计算量/GFLOPs | 参数量/106 |
---|---|---|---|
SPP | 83.1 | 16.5 | 7.22 |
SPPF | 83.4 | 16.5 | 7.23 |
CSPSPPFC | 84.7 | 21.2 | 13.50 |
Tab. 4 Experimental results of different spatial pyramid poolings on PASCAL VOC dataset
模块 | AP50/% | 计算量/GFLOPs | 参数量/106 |
---|---|---|---|
SPP | 83.1 | 16.5 | 7.22 |
SPPF | 83.4 | 16.5 | 7.23 |
CSPSPPFC | 84.7 | 21.2 | 13.50 |
SC | MFN | AFF | AP50/% | 参数量/106 | 计算量/GFLOPs | 帧率/ (frame·s-1) |
---|---|---|---|---|---|---|
56.8 | 7.0 | 16.0 | 101.0 | |||
√ | 58.3 | 13.5 | 21.2 | 96.2 | ||
√ | 59.8 | 12.5 | 24.5 | 80.7 | ||
√ | 57.8 | 7.3 | 17.0 | 95.3 | ||
√ | √ | 58.2 | 13.6 | 22.2 | 76.4 | |
√ | √ | 60.2 | 12.7 | 25.5 | 75.2 | |
√ | √ | √ | 61.4 | 19.0 | 30.6 | 78.2 |
Tab. 5 Comparison results of different enhancement strategies on COCO2017-val dataset
SC | MFN | AFF | AP50/% | 参数量/106 | 计算量/GFLOPs | 帧率/ (frame·s-1) |
---|---|---|---|---|---|---|
56.8 | 7.0 | 16.0 | 101.0 | |||
√ | 58.3 | 13.5 | 21.2 | 96.2 | ||
√ | 59.8 | 12.5 | 24.5 | 80.7 | ||
√ | 57.8 | 7.3 | 17.0 | 95.3 | ||
√ | √ | 58.2 | 13.6 | 22.2 | 76.4 | |
√ | √ | 60.2 | 12.7 | 25.5 | 75.2 | |
√ | √ | √ | 61.4 | 19.0 | 30.6 | 78.2 |
Neck | AP50/% | APS/% | 参数量 |
---|---|---|---|
FPN+PANet | 56.8 | 18.8 | 1.00 |
NAS-FPN | 53.2 | — | 0.73 |
BiFPN | 55.5 | — | 0.69 |
SPD | 59.1 | 21.9 | 2.18 |
MFN | 61.4 | 25.9 | 1.24 |
Tab. 6 Performance comparison of different feature enhancement networks
Neck | AP50/% | APS/% | 参数量 |
---|---|---|---|
FPN+PANet | 56.8 | 18.8 | 1.00 |
NAS-FPN | 53.2 | — | 0.73 |
BiFPN | 55.5 | — | 0.69 |
SPD | 59.1 | 21.9 | 2.18 |
MFN | 61.4 | 25.9 | 1.24 |
1 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. |
2 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37. |
3 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. |
4 | MA W, WU Y, CEN F, et al. MDFN: multi-scale deep feature learning network for object detection [J]. Pattern Recognition, 2020, 100: 107149. |
5 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. |
6 | 王建军, 魏江, 梅少辉, 等. 面向遥感图像小目标检测的改进YOLOv3算法 [J]. 计算机工程与应用, 2021, 57(20): 133-141. |
WANG J J, WEI J, MEI S H, et al. Improved YOLOv3 for small object detection in remote sensing images [J]. Computer Engineering and Applications, 2021, 57(20): 133-141. | |
7 | 陈欣, 万敏杰, 马超, 等. 采用多尺度特征融合SSD的遥感图像小目标检测 [J]. 光学精密工程, 2021, 29(11): 2672-2682. |
CHEN X, WAN M J, MA C, et al. Recognition of small targets in remote sensing image using multi-scale feature fusion-based shot multi-box detector[J]. Optics and Precision Engineering, 2021, 29(11) : 2672-2682. | |
8 | ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on Transformer prediction head for object detection on drone-captured scenarios [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2021: 2778-2788. |
9 | 蒋镕圻, 彭月平, 谢文宣, 等. 嵌入scSE模块的改进YOLOv4小目标检测算法[J]. 图学学报, 2021, 42(4): 546-555. |
JIANG R X, PENG Y P, XIE W X, et al. Improved YOLOv4 small target detection algorithm with embedded scSE module [J]. Journal of Graphics, 2021, 42(4): 546-555. | |
10 | ZHAO H, ZHANG H, ZHAO Y. YOLOv7-sea: object detection of maritime UAV images based on improved YOLOv7 [C]// Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2023: 233-238. |
11 | GE Z, LIU S, WANG F, et al. YOLOX: exceeding yolo series in 2021 [EB/OL]. [2022-03-22]. . |
12 | YANG C, HANG Z, WANG N. QueryDet: cascaded sparse query for accelerating high-resolution small object detection [C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 13658-13667. |
13 | 赵鹏飞, 谢林柏, 彭力. 融合注意力机制的深层次小目标检测算法 [J]. 计算机科学与探索, 2022, 16(4): 927-937. |
ZHAO P F, XIE L B, PENG L. Deep small object detection algorithm integrating attention mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 927-937. | |
14 | ZHANG Z, LU X, CAO G, et al. ViT-YOLO: Transformer-based YOLO for object detection [C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2021: 2799-2808. |
15 | WU S, YU F, YU X, et al. TFNet: multi-semantic feature interaction for CTR prediction [C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 1885-1888. |
16 | LI X, WANG W, WU L, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 21002-21012. |
17 | PANG J, CHEN K, SHI J, et al. Libra R-CNN: towards balanced learning for object detection [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 821-830. |
18 | LIANG T, WANG Y, TANG Z, et al. OPANAS: one-shot path aggregation network architecture search for object detection [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10190-10198. |
19 | ZENG N, WU P, WANG Z, et al. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection [J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 3507014. |
20 | ZHANG W, FU C, XIE H, et al. Global context aware RCNN for object detection [J]. Neural Computing and Applications, 2021, 33(18): 11627-11639. |
21 | WANG C-Y, BOCHKOVSKIY A, LIAO H-Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475. |
22 | BOCHKOVSKIY A, WANG C-Y, LIAO H-Y M. YOLOv4: optimal speed and accuracy of object detection [EB/OL]. [2023-02-14]. . |
23 | TANG P, WANG X, WANG A, et al. Weakly supervised region proposal network and object detection [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 370-386. |
24 | LIN T-Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. |
25 | MA J. RRPN++: guidance towards more accurate scene text detection [EB/OL]. [2023-05-29]. . |
26 | SINGH B, NAJIBI M, DAVIS L S. SNIPER: efficient multi-scale training [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9333-9343. |
27 | HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. |
28 | LIN T-Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944. |
29 | DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway: IEEE, 2019: 213-226. |
30 | FU C-Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [EB/OL]. [2023-05-29]. . |
31 | REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. [2022-10-24]. . |
32 | SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 761-769. |
33 | ZHU Y, ZHAO C, WANG J, et al. CoupleNet: coupling global structure with local parts for object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 4146-4154. |
34 | SONG H, SUN D, CHUN S, et al. ViDT: an efficient and effective fully Transformer-based object detector [EB/OL]. [2023-08-13]. . |
35 | CHEN C, ZHANG Y, LV Q, et al. RRNet: a hybrid detector for object detection in drone-captured images [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway: IEEE, 2019: 100-108. |
36 | HUANG Q, ZHAO C, JIANG M, et al. Cascade-Net: a new deep learning architecture for OFDM detection [EB/OL]. [2023-08-24]. . |
37 | ZHOU J, C-M VONG, LIU Q, et al. Scale adaptive image cropping for UAV object detection [J]. Neurocomputing, 2019, 366: 305-313. |
38 | DENG S, LI S, XIE K, et al. A global-local self-adaptive network for drone-view object detection [J]. IEEE Transactions on Image Processing, 2020, 30: 1556-1569. |
39 | LI Y, CHEN Y, WANG N, et al. Scale-aware trident networks for object detection [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6054-6063. |
40 | LIU Y, YANG F, HU P. Small-object detection in UAV-captured images via multi-branch parallel feature pyramid networks [J]. IEEE Access, 2020, 8: 145740-145750. |
41 | GHIASI G, LIN T-Y, LE Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 7029-7038. |
42 | TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787. |
43 | SUNKARA R, LUO T. No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects [C]// Proceedings of the 2022 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2022: 443-459. |
[1] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[2] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[3] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[4] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[5] | Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380. |
[6] | Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557. |
[7] | Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625. |
[8] | Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650. |
[9] | Yingjun ZHANG, Niuniu LI, Binhong XIE, Rui ZHANG, Wangdong LU. Semi-supervised object detection framework guided by curriculum learning [J]. Journal of Computer Applications, 2024, 44(8): 2326-2333. |
[10] | Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587. |
[11] | Sailong SHI, Zhiwen FANG. Gaze estimation model based on multi-scale aggregation and shared attention [J]. Journal of Computer Applications, 2024, 44(7): 2047-2054. |
[12] | Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318. |
[13] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[14] | Wudan LONG, Bo PENG, Jie HU, Ying SHEN, Danni DING. Road damage detection algorithm based on enhanced feature extraction [J]. Journal of Computer Applications, 2024, 44(7): 2264-2270. |
[15] | Ruihua LIU, Zihe HAO, Yangyang ZOU. Gait recognition algorithm based on multi-layer refined feature fusion [J]. Journal of Computer Applications, 2024, 44(7): 2250-2257. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||