《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (8): 2619-2629.DOI: 10.11772/j.issn.1001-9081.2022081207
收稿日期:
2022-09-01
修回日期:
2022-11-07
接受日期:
2022-11-14
发布日期:
2023-01-11
出版日期:
2023-08-10
通讯作者:
程欣宇
作者简介:
段升位(1996—),男,四川攀枝花人,硕士研究生,主要研究方向:计算机视觉、目标检测基金资助:
Shengwei DUAN1, Xinyu CHENG1(), Haozhou WANG1, Fei WANG2
Received:
2022-09-01
Revised:
2022-11-07
Accepted:
2022-11-14
Online:
2023-01-11
Published:
2023-08-10
Contact:
Xinyu CHENG
About author:
DUAN Shengwei, born in 1996, M. S. candidate. His research interests include computer vision, object detection.Supported by:
摘要:
针对当前水利大坝主要依靠人工现场巡视,运营成本高且效率低的问题,提出一种基于YOLOv5的改进检测算法。首先,采用改进的多尺度的视觉Transformer结构改进主干网络,并利用多尺度Transformer结构关联的多尺度全局信息和卷积神经网络(CNN)提取的局部信息来构建聚合特征,从而充分利用多尺度的语义信息和位置信息来提高网络的特征提取能力。然后,在网络的每个特征检测层前加入同位注意力机制,以在图像的高度和宽度方向分别进行特征编码,再用编码后的特征构建特征图上像素的长距离关联,从而增强网络在复杂环境中的目标定位能力。接着,改进了网络正负训练样本的采样算法,通过构建先验框与真实框的平均契合度和差异度筛选样本来辅助候选正样本与自身形状相近的先验框产生响应,以帮助网络更快、更好地收敛,从而提升网络的整体性能和网络泛化性。最后,针对应用需求对网络进行了轻量化,并通过对网络结构剪枝和结构重参数化优化网络结构。实验结果表明:在当前采用的大坝病害数据上,对比原始YOLOv5s算法,改进后的网络mAP@0.5提升了10.5个百分点,mAP@0.5:0.95提高了17.3个百分点;轻量化后的网络对比轻量化之前的网络的参数量和计算量分别降低了24%和13%,检测速度提升了42%,满足当前应用场景下病害检测精度和速度的要求。
中图分类号:
段升位, 程欣宇, 王浩舟, 王飞. 基于改进的YOLOv5的大坝表面病害检测算法[J]. 计算机应用, 2023, 43(8): 2619-2629.
Shengwei DUAN, Xinyu CHENG, Haozhou WANG, Fei WANG. Dam surface disease detection algorithm based on improved YOLOv5[J]. Journal of Computer Applications, 2023, 43(8): 2619-2629.
类别 | 训练集 | 验证集 | 测试集 | 类别 | 训练集 | 验证集 | 测试集 |
---|---|---|---|---|---|---|---|
seepage | 1 473 | 491 | 491 | crack01 | 1 344 | 448 | 448 |
crack00 | 1 496 | 498 | 498 | crack02 | 1 391 | 464 | 464 |
表1 大坝表面病害数据集的划分
Tab. 1 Division of dam surface disease dataset
类别 | 训练集 | 验证集 | 测试集 | 类别 | 训练集 | 验证集 | 测试集 |
---|---|---|---|---|---|---|---|
seepage | 1 473 | 491 | 491 | crack01 | 1 344 | 448 | 448 |
crack00 | 1 496 | 498 | 498 | crack02 | 1 391 | 464 | 464 |
模型 | AP/% | mAP% | |||
---|---|---|---|---|---|
seepage | crack00 | crack01 | crack02 | ||
YOLOv5s | 61.7 | 85.7 | 43.8 | 89.1 | 70.1 |
YOLO-MT-CA | 71.9 | 86.4 | 54.9 | 98.9 | 78.3 |
表2 YOLOv5s和YOLO-MT-CA模型的结果比较
Tab. 2 Comparison of results of YOLOv5s and YOLO-MT-CA models
模型 | AP/% | mAP% | |||
---|---|---|---|---|---|
seepage | crack00 | crack01 | crack02 | ||
YOLOv5s | 61.7 | 85.7 | 43.8 | 89.1 | 70.1 |
YOLO-MT-CA | 71.9 | 86.4 | 54.9 | 98.9 | 78.3 |
模型 | AP | mAP | |||
---|---|---|---|---|---|
seepage | crack00 | crack01 | crack02 | ||
YOLO-MT-CA | 71.9 | 86.4 | 54.9 | 98.9 | 78.3 |
BPSS | 72.1 | 87.0 | 65.1 | 99.1 | 80.8 |
表3 YOLO-MT-CA和BPSS模型的结果比较 (%)
Tab. 3 Comparison of results of YOLO-MT-CA and BPSS models
模型 | AP | mAP | |||
---|---|---|---|---|---|
seepage | crack00 | crack01 | crack02 | ||
YOLO-MT-CA | 71.9 | 86.4 | 54.9 | 98.9 | 78.3 |
BPSS | 72.1 | 87.0 | 65.1 | 99.1 | 80.8 |
实验轮次 | 数据增强 | MT | CA | BPSS | 轻量化 | 模型大小/MB | 计算量/GFLOPs | mAP/% | 准确率/% | 召回率/% | 帧速率/(frame·s-1) |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 14.4 | 15.9 | 64.3 | 67.9 | 62.7 | 43 | |||||
2 | √ | 14.4 | 15.9 | 70.1 | 74.5 | 69.8 | 43 | ||||
3 | √ | √ | 17.6 | 16.6 | 74.9 | 78.7 | 73.7 | 38 | |||
4 | √ | √ | √ | 17.9 | 16.7 | 78.3 | 79.4 | 74.4 | 38 | ||
5 | √ | √ | √ | √ | 17.9 | 16.7 | 80.8 | 81.0 | 75.9 | 38 | |
6 | √ | √ | √ | √ | √ | 13.6 | 14.5 | 80.6 | 80.5 | 75.8 | 54 |
表4 不同的改进方法对检测器性能的影响
Tab. 4 Effects of different improvement methods on detector performance
实验轮次 | 数据增强 | MT | CA | BPSS | 轻量化 | 模型大小/MB | 计算量/GFLOPs | mAP/% | 准确率/% | 召回率/% | 帧速率/(frame·s-1) |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 14.4 | 15.9 | 64.3 | 67.9 | 62.7 | 43 | |||||
2 | √ | 14.4 | 15.9 | 70.1 | 74.5 | 69.8 | 43 | ||||
3 | √ | √ | 17.6 | 16.6 | 74.9 | 78.7 | 73.7 | 38 | |||
4 | √ | √ | √ | 17.9 | 16.7 | 78.3 | 79.4 | 74.4 | 38 | ||
5 | √ | √ | √ | √ | 17.9 | 16.7 | 80.8 | 81.0 | 75.9 | 38 | |
6 | √ | √ | √ | √ | √ | 13.6 | 14.5 | 80.6 | 80.5 | 75.8 | 54 |
序号 | 当前层的输入层 | 模块数量 | 参数量 | 模块名称 | 参数设置 |
---|---|---|---|---|---|
0 | -1 | 1 | 3 520 | Conv | [3, 32, 6, 2, 2] |
1 | -1 | 1 | 20 736 | Rep_Block | [32, 64, 3, 2] |
2 | -1 | 1 | 18 816 | C3 | [64, 64, 1] |
3 | -1 | 1 | 82 432 | Rep_Block | [64, 128, 3, 2] |
4 | -1 | 1 | 74 560 | MT_Block | [128, 128, 1] |
5 | -1 | 1 | 328 704 | Rep_Block | [128, 256, 3, 2] |
6 | -1 | 1 | 296 576 | MT_Block | [256, 256, 1] |
7 | -1 | 1 | 1 312 768 | Rep_Block | [256, 512, 3, 2] |
8 | -1 | 1 | 1 182 976 | MT_Block | [512, 512, 1] |
9 | -1 | 1 | 656 896 | SPPF | [512, 512, 5] |
10 | -1 | 1 | 131 584 | Conv | [512, 256, 1, 1] |
11 | -1 | 1 | 0 | Upsampling | [None, 2, 'nearest'] |
12 | [ | 1 | 0 | Concat | [ |
13 | -1 | 1 | 361 984 | C3 | [512, 256, 1, False] |
14 | -1 | 1 | 33 024 | Conv | [256, 128, 1, 1] |
15 | -1 | 1 | 0 | Upsampling | [None, 2, 'nearest'] |
16 | [ | 1 | 0 | Concat | [ |
17 | -1 | 1 | 90 880 | C3 | [256, 128, 1, False] |
18 | -1 | 1 | 6 448 | CA_Block | 128, 128, 8] |
19 | -1 | 1 | 147 712 | Conv | [128, 128, 3, 2] |
20 | [ | 1 | 0 | Concat | [ |
21 | -1 | 1 | 296 448 | C3 | [256, 256, 1, False] |
22 | -1 | 1 | 12 848 | CA_Block | 256, 256, 16] |
23 | -1 | 1 | 590 336 | Conv | [256, 256, 3, 2] |
24 | [ | 1 | 0 | Concat | [ |
25 | -1 | 1 | 1 182 720 | C3 | [512, 512, 1, False] |
26 | -1 | 1 | 25 648 | CA_Block | [512,512,32] |
27 | [ | 1 | 24 273 | Detect | [4,[[10,13,16,30,33,23],[30,61,62,45,59,119],[116,90,156,198,373,326]],[128,256,512]] |
表5 模型的网络结构及参数设置
Tab. 5 Network structure and parameter setting of model
序号 | 当前层的输入层 | 模块数量 | 参数量 | 模块名称 | 参数设置 |
---|---|---|---|---|---|
0 | -1 | 1 | 3 520 | Conv | [3, 32, 6, 2, 2] |
1 | -1 | 1 | 20 736 | Rep_Block | [32, 64, 3, 2] |
2 | -1 | 1 | 18 816 | C3 | [64, 64, 1] |
3 | -1 | 1 | 82 432 | Rep_Block | [64, 128, 3, 2] |
4 | -1 | 1 | 74 560 | MT_Block | [128, 128, 1] |
5 | -1 | 1 | 328 704 | Rep_Block | [128, 256, 3, 2] |
6 | -1 | 1 | 296 576 | MT_Block | [256, 256, 1] |
7 | -1 | 1 | 1 312 768 | Rep_Block | [256, 512, 3, 2] |
8 | -1 | 1 | 1 182 976 | MT_Block | [512, 512, 1] |
9 | -1 | 1 | 656 896 | SPPF | [512, 512, 5] |
10 | -1 | 1 | 131 584 | Conv | [512, 256, 1, 1] |
11 | -1 | 1 | 0 | Upsampling | [None, 2, 'nearest'] |
12 | [ | 1 | 0 | Concat | [ |
13 | -1 | 1 | 361 984 | C3 | [512, 256, 1, False] |
14 | -1 | 1 | 33 024 | Conv | [256, 128, 1, 1] |
15 | -1 | 1 | 0 | Upsampling | [None, 2, 'nearest'] |
16 | [ | 1 | 0 | Concat | [ |
17 | -1 | 1 | 90 880 | C3 | [256, 128, 1, False] |
18 | -1 | 1 | 6 448 | CA_Block | 128, 128, 8] |
19 | -1 | 1 | 147 712 | Conv | [128, 128, 3, 2] |
20 | [ | 1 | 0 | Concat | [ |
21 | -1 | 1 | 296 448 | C3 | [256, 256, 1, False] |
22 | -1 | 1 | 12 848 | CA_Block | 256, 256, 16] |
23 | -1 | 1 | 590 336 | Conv | [256, 256, 3, 2] |
24 | [ | 1 | 0 | Concat | [ |
25 | -1 | 1 | 1 182 720 | C3 | [512, 512, 1, False] |
26 | -1 | 1 | 25 648 | CA_Block | [512,512,32] |
27 | [ | 1 | 24 273 | Detect | [4,[[10,13,16,30,33,23],[30,61,62,45,59,119],[116,90,156,198,373,326]],[128,256,512]] |
无人机到坝面/坝顶的距离/m | 每帧图像覆盖面积/% | 查全率/% | 准确率/% | F1-score |
---|---|---|---|---|
[10,20) | 11 | 82.7 | 91.7 | 0.870 |
[20,30) | 20 | 81.6 | 91.2 | 0.861 |
[30,40) | 52 | 75.3 | 88.2 | 0.812 |
[40,50) | 97 | 58.1 | 69.6 | 0.633 |
表6 模型到坝面/坝顶不同距离的检测性能对比
Tab. 6 Comparison of detection performance for different distances from model to dam surface/dam top
无人机到坝面/坝顶的距离/m | 每帧图像覆盖面积/% | 查全率/% | 准确率/% | F1-score |
---|---|---|---|---|
[10,20) | 11 | 82.7 | 91.7 | 0.870 |
[20,30) | 20 | 81.6 | 91.2 | 0.861 |
[30,40) | 52 | 75.3 | 88.2 | 0.812 |
[40,50) | 97 | 58.1 | 69.6 | 0.633 |
无人机到防浪墙等区域距离/m | 每帧图像覆盖面积/% | 查全率/% | 准确率/% | F1-score |
---|---|---|---|---|
[5,10) | 7 | 87.6 | 93.2 | 0.903 |
[10,15) | 10 | 87.4 | 93.1 | 0.902 |
[15,20) | 15 | 75.8 | 89.6 | 0.821 |
[20,25) | 32 | 61.9 | 83.2 | 0.710 |
表7 模型到防浪墙等区域不同距离的检测性能对比
Tab. 7 Comparison of detection performance for different distances from model to region such as wave wall
无人机到防浪墙等区域距离/m | 每帧图像覆盖面积/% | 查全率/% | 准确率/% | F1-score |
---|---|---|---|---|
[5,10) | 7 | 87.6 | 93.2 | 0.903 |
[10,15) | 10 | 87.4 | 93.1 | 0.902 |
[15,20) | 15 | 75.8 | 89.6 | 0.821 |
[20,25) | 32 | 61.9 | 83.2 | 0.710 |
序号 | 模型 | 输入尺寸 | mAP@0.5 | mAP@0.5:0.95 | 模型大小/MB | 计算量/GFLOPs | 帧速率/(frame·s-1) |
---|---|---|---|---|---|---|---|
1 | Faster R-CNN | 640×640 | 0.727 | 0.424 | 315.00 | 276.1 | 5 |
2 | RetinaNet | 640×640 | 0.739 | 0.453 | 277.20 | 220.0 | 26 |
3 | Casecade-swinTransformer | 640×640 | 0.772 | 0.499 | 219.00 | 267.0 | 12 |
4 | SSD-Lite | 640×640 | 0.713 | 0.404 | 6.30 | 7.6 | 63 |
5 | EfficientDet | 640×640 | 0.724 | 0.398 | 15.10 | 16.9 | 25 |
6 | YOLOX-s | 640×640 | 0.696 | 0.315 | 17.70 | 26.8 | 33 |
7 | PPYOLOE-s | 640×640 | 0.717 | 0.376 | 15.90 | 17.4 | 51 |
8 | YOLOV6s | 640×640 | 0.731 | 0.372 | 28.40 | 44.2 | 37 |
9 | YOLOV7-tiny | 640×640 | 0.718 | 0.353 | 12.10 | 13.9 | 61 |
10 | YOLOv3-tiny | 640×640 | 0.695 | 0.322 | 16.60 | 12.9 | 56 |
11 | YOLOv5s-mobilenetv3[ | 640×640 | 0.688 | 0.356 | 6.90 | 7.9 | 57 |
12 | YOLOv5s-ghostnet[ | 640×640 | 0.719 | 0.443 | 7.34 | 8.1 | 42 |
13 | YOLOv5s-shufflenet[ | 640×640 | 0.687 | 0.363 | 3.30 | 4.7 | 78 |
14 | YOLOv5s-Transformer | 640×640 | 0.741 | 0.446 | 15.60 | 17.6 | 40 |
15 | YOLOv5s | 640×640 | 0.701 | 0.328 | 14.40 | 16.6 | 43 |
16 | YOLOv5s6 | 640×640 | 0.729 | 0.411 | 23.80 | 16.2 | 41 |
17 | 本文模型 | 640×640 | 0.806 | 0.501 | 13.60 | 14.5 | 54 |
表8 不同检测模型的性能比较
Tab. 8 Comparison of different detection models
序号 | 模型 | 输入尺寸 | mAP@0.5 | mAP@0.5:0.95 | 模型大小/MB | 计算量/GFLOPs | 帧速率/(frame·s-1) |
---|---|---|---|---|---|---|---|
1 | Faster R-CNN | 640×640 | 0.727 | 0.424 | 315.00 | 276.1 | 5 |
2 | RetinaNet | 640×640 | 0.739 | 0.453 | 277.20 | 220.0 | 26 |
3 | Casecade-swinTransformer | 640×640 | 0.772 | 0.499 | 219.00 | 267.0 | 12 |
4 | SSD-Lite | 640×640 | 0.713 | 0.404 | 6.30 | 7.6 | 63 |
5 | EfficientDet | 640×640 | 0.724 | 0.398 | 15.10 | 16.9 | 25 |
6 | YOLOX-s | 640×640 | 0.696 | 0.315 | 17.70 | 26.8 | 33 |
7 | PPYOLOE-s | 640×640 | 0.717 | 0.376 | 15.90 | 17.4 | 51 |
8 | YOLOV6s | 640×640 | 0.731 | 0.372 | 28.40 | 44.2 | 37 |
9 | YOLOV7-tiny | 640×640 | 0.718 | 0.353 | 12.10 | 13.9 | 61 |
10 | YOLOv3-tiny | 640×640 | 0.695 | 0.322 | 16.60 | 12.9 | 56 |
11 | YOLOv5s-mobilenetv3[ | 640×640 | 0.688 | 0.356 | 6.90 | 7.9 | 57 |
12 | YOLOv5s-ghostnet[ | 640×640 | 0.719 | 0.443 | 7.34 | 8.1 | 42 |
13 | YOLOv5s-shufflenet[ | 640×640 | 0.687 | 0.363 | 3.30 | 4.7 | 78 |
14 | YOLOv5s-Transformer | 640×640 | 0.741 | 0.446 | 15.60 | 17.6 | 40 |
15 | YOLOv5s | 640×640 | 0.701 | 0.328 | 14.40 | 16.6 | 43 |
16 | YOLOv5s6 | 640×640 | 0.729 | 0.411 | 23.80 | 16.2 | 41 |
17 | 本文模型 | 640×640 | 0.806 | 0.501 | 13.60 | 14.5 | 54 |
1 | 钱正英. 中国可持续发展水资源战略研究综合报告[C]// 中国水利学会2001学术年会论文集. 北京:中国水利水电出版社, 2001:3-18. |
QIAN Z Y. Comprehensive report of strategic research on sustainable development of water resource in China[C]// Proceedings of the 2001 Annual Academic Conference of Chinese Hydraulic Engineering Society. Beijing: China Water and Power Press, 2001:3-18. | |
2 | 刘成栋,向衍,张士辰,等. 水库大坝安全智能巡检系统设计与实现[J]. 中国水利, 2018(20):39-41. 10.3969/j.issn.1000-1123.2018.20.010 |
LIU C D, XIANG Y, ZHANG S C, et al. The design and implementation of intelligent inspection system of reservoir dams based on big data[J]. China Water Resources, 2018(20):39-41. 10.3969/j.issn.1000-1123.2018.20.010 | |
3 | 吴中如,顾冲时,沈振中,等. 大坝安全综合分析和评价的理论、方法及其应用[J]. 水利水电科技进展, 1998, 18(3): 2-6, 65. |
WU Z R, GU C S, SHEN Z Z, et al. Theory and application of dam safety synthetical analysis and assessment[J]. Advances in Science and Technology of Water Resources, 1998, 18(3):2-6, 65. | |
4 | NISHIKAWA T, YOSHIDA J, SUGIYAMA T, et al. Concrete crack detection by multiple sequential image filtering[J]. Computer-Aided Civil and Infrastructure Engineering, 2012, 27(1): 29-47. 10.1111/j.1467-8667.2011.00716.x |
5 | 王慧玲,綦小龙,武港山. 基于深度卷积神经网络的目标检测技术的研究进展[J]. 计算机科学, 2018, 45(9):11-19. |
WANG H L, QI X L, WU G S. Research progress of object detection technology based on convolutional neural network in deep learning[J]. Computer Science, 2018, 45(9): 11-19. | |
6 | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceeding of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587. 10.1109/cvpr.2014.81 |
7 | GIRSHICK R. Fast R-CNN[C]// Proceeding of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015:1440-1448. 10.1109/iccv.2015.169 |
8 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// Proceeding of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015, 1:91-99. |
9 | TAO X, ZHANG D P, WANG Z H, et al. Detection of power line insulator defects using aerial images analyzed with convolutional neural networks[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(4): 1486-1498. 10.1109/tsmc.2018.2871750 |
10 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016:21-37. |
11 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016:779-788. 10.1109/cvpr.2016.91 |
12 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceeding of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. 10.1109/cvpr.2017.690 |
13 | REDMON R, FARHIDI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2022-03-20].. |
14 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23) [2022-03-23].. |
15 | Ultralytics. YOLOv5[EB/OL]. [2022-03-23].. 10.1117/1.jei.31.3.033033 |
16 | GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. (2021-08-06) [2022-09-23].. |
17 | LI C Y, LI L L, JIANG H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. (2022-09-07) [2022-09-23].. |
18 | WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[EB/OL]. (2022-07-06) [2022-09-23].. 10.48550/arXiv.2207.02696 |
19 | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. 10.1109/iccv.2017.324 |
20 | TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787. 10.1109/cvpr42600.2020.01079 |
21 | XU X H, ZHENG H, GUO Z Y, et al. SDD-CNN: small data-driven convolution neural networks for subtle roller defect inspection[J]. Applied Sciences, 2019, 9(7): No.1364. 10.3390/app9071364 |
22 | ZHANG C B, CHANG C C, JAMSHIDI M. Bridge damage detection using a single-stage detector and field inspection images[EB/OL]. (2019-02-23) [2022-09-21].. 10.1111/mice.12500 |
23 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |
24 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. (2021-06-03) [2022-04-12].. |
25 | LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 10012-10022. 10.1109/iccv48922.2021.00986 |
26 | CHEN C F R, FAN Q F, PANDA R. CrossViT: cross-attention multi-scale vision transformer for image classification[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 347-356. 10.1109/iccv48922.2021.00041 |
27 | HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13708-13717. 10.1109/cvpr46437.2021.01350 |
28 | ZHANG S F, CHI C, YAO Y Q, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 9756-9765. 10.1109/cvpr42600.2020.00978 |
29 | ZHENG Z H, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 12993-13000. 10.1609/aaai.v34i07.6999 |
30 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
31 | HARTIGAN J A, WONG M A. Algorithm AS 136: a K-means clustering algorithm[J]. Journal of the Royal Statistical Society, Series C (Applied Statistics), 1979, 28(1): 100-108. 10.2307/2346830 |
32 | HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]// Proceedings of the 2005 International Conference on Intelligent Computing, LNCS 3644. Berlin: Springer, 2005: 878-887. |
33 | DING X H, ZHANG X Y, HAN J G, et al. Diverse branch block: building a convolution as an inception-like unit[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10881-10890. 10.1109/cvpr46437.2021.01074 |
34 | HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17) [2022-04-12].. 10.48550/arXiv.1704.04861 |
35 | HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1577-1586. 10.1109/cvpr42600.2020.00165 |
36 | ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6848-6856. 10.1109/cvpr.2018.00716 |
[1] | 潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877. |
[2] | 李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587. |
[3] | 张英俊, 李牛牛, 谢斌红, 张睿, 陆望东. 课程学习指导下的半监督目标检测框架[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2326-2333. |
[4] | 徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199. |
[5] | 孙逊, 冯睿锋, 陈彦如. 基于深度与实例分割融合的单目3D目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2208-2215. |
[6] | 姬张建, 杜娜. 基于改进VariFocalNet的微小目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2200-2207. |
[7] | 刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977. |
[8] | 邓亚平, 李迎江. YOLO算法及其在自动驾驶场景中目标检测综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1949-1958. |
[9] | 耿焕同, 刘振宇, 蒋骏, 范子辰, 李嘉兴. 基于改进YOLOv8的嵌入式道路裂缝检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1613-1618. |
[10] | 宋霄罡, 张冬冬, 张鹏飞, 梁莉, 黑新宏. 面向复杂施工环境的实时目标检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1605-1612. |
[11] | 李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1619-1628. |
[12] | 李鸿天, 史鑫昊, 潘卫国, 徐成, 徐冰心, 袁家政. 融合多尺度和注意力机制的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1437-1444. |
[13] | 陈天华, 朱家煊, 印杰. 基于注意力机制的鸟类识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1114-1120. |
[14] | 王伟, 赵春辉, 唐心瑶, 席刘钢. 自适应地平线约束下的车辆三维检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 909-915. |
[15] | 郑宇亮, 陈云华, 白伟杰, 陈平华. 融合事件数据和图像帧的车辆目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 931-937. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||