基于改进的YOLOv5的大坝表面病害检测算法

doi:10.11772/j.issn.1001-9081.2022081207

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (8): 2619-2629.DOI: 10.11772/j.issn.1001-9081.2022081207

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于改进的YOLOv5的大坝表面病害检测算法

段升位¹, 程欣宇¹(), 王浩舟¹, 王飞²

^1.公共大数据国家重点实验室（贵州大学），贵阳 550025
^2.中国电建集团贵阳勘测设计研究院有限公司，贵阳 550081

收稿日期:2022-09-01 修回日期:2022-11-07 接受日期:2022-11-14 发布日期:2023-01-11 出版日期:2023-08-10
通讯作者: 程欣宇
作者简介:段升位（1996—），男，四川攀枝花人，硕士研究生，主要研究方向：计算机视觉、目标检测
王浩舟（1994—），男，贵州贵阳人，硕士研究生，主要研究方向：机器视觉
王飞（1982—），男，贵州贵阳人，工程师，主要研究方向：工程安全监测软件、自动化控制系统。
基金资助:
贵州省水利厅科研项目(KT202010)

Dam surface disease detection algorithm based on improved YOLOv5

Shengwei DUAN¹, Xinyu CHENG¹(), Haozhou WANG¹, Fei WANG²

^1.State Key Laboratory of Public Big Data （Guizhou University），Guiyang Guizhou 550025，China
^2.Power China Guiyang Engineering Corporation Limited，Guiyang Guizhou 550081，China

Received:2022-09-01 Revised:2022-11-07 Accepted:2022-11-14 Online:2023-01-11 Published:2023-08-10
Contact: Xinyu CHENG
About author:DUAN Shengwei， born in 1996， M. S. candidate. His research interests include computer vision， object detection.
WANG Haozhou， born in 1994， M. S. candidate. His research interests include computer vision.
WANG Fei， born in 1982， engineer. His research interests include engineering safety monitoring software， automation control system.
Supported by:
Research Project of Water Resource Department of Guizhou Province(KT202010)

摘要/Abstract

摘要：

针对当前水利大坝主要依靠人工现场巡视，运营成本高且效率低的问题，提出一种基于YOLOv5的改进检测算法。首先，采用改进的多尺度的视觉Transformer结构改进主干网络，并利用多尺度Transformer结构关联的多尺度全局信息和卷积神经网络（CNN）提取的局部信息来构建聚合特征，从而充分利用多尺度的语义信息和位置信息来提高网络的特征提取能力。然后，在网络的每个特征检测层前加入同位注意力机制，以在图像的高度和宽度方向分别进行特征编码，再用编码后的特征构建特征图上像素的长距离关联，从而增强网络在复杂环境中的目标定位能力。接着，改进了网络正负训练样本的采样算法，通过构建先验框与真实框的平均契合度和差异度筛选样本来辅助候选正样本与自身形状相近的先验框产生响应，以帮助网络更快、更好地收敛，从而提升网络的整体性能和网络泛化性。最后，针对应用需求对网络进行了轻量化，并通过对网络结构剪枝和结构重参数化优化网络结构。实验结果表明：在当前采用的大坝病害数据上，对比原始YOLOv5s算法，改进后的网络mAP@0.5提升了10.5个百分点，mAP@0.5：0.95提高了17.3个百分点；轻量化后的网络对比轻量化之前的网络的参数量和计算量分别降低了24%和13%，检测速度提升了42%，满足当前应用场景下病害检测精度和速度的要求。

关键词: 目标检测, 工程缺陷, YOLOv5, 多尺度视觉Transformer, 同位注意力机制, 大坝病害

Abstract:

For the current water conservancy dams mainly rely on manual on-site inspections， which have high operating costs and low efficiency， an improved detection algorithm based on YOLOv5 was proposed. Firstly， a modified multi-scale visual Transformer structure was used to improve the backbone， and the multi-scale global information associated with the multi-scale Transformer structure and the local information extracted by Convolutional Neural Network （CNN） were used to construct the aggregated features， thereby making full use of the multi-scale semantic information and location information to improve the feature extraction capability of the network. Then， coordinate attention mechanism was added in front of each feature detection layer of the network to encode features in the height and width directions of the image， and long-distance associations of pixels on the feature map were constructed by the encoded features to enhance the target localization ability of the network in complex environments. The sampling algorithm of the positive and negative training samples of the network was improved to help the candidate positive samples to respond to the prior frames of similar shape to themselves by constructing the average fit and difference between the prior frames and the ground-truth frames， so as to make the network converge faster and better， thus improving the overall performance of the network and the network generalization. Finally， the network structure was lightened for application requirements and was optimized by pruning the network structure and structural re-parameterization. Experimental results show that on the current adopted dam disease data， compared with the original YOLOv5s algorithm， the improved network has the mAP （mean Average Precision）@0.5 improved by 10.5 percentage points， the mAP@0.5：0.95 improved by 17.3 percentage points； compared to the network before lightening， the lightweight network has the number of parameters and the FLOPs（FLoating point Operations Per second） reduced by 24% and 13% respectively， and the detection speed improved by 42%， verifying that the network meets the requirements for precision and speed of disease detection in current application scenarios.

Key words: object detection, engineering defect, YOLOv5, multi-scale visual Transformer, coordinate attention mechanism, dam disease

中图分类号:

TP391.41

段升位, 程欣宇, 王浩舟, 王飞. 基于改进的YOLOv5的大坝表面病害检测算法[J]. 计算机应用, 2023, 43(8): 2619-2629.

Shengwei DUAN, Xinyu CHENG, Haozhou WANG, Fei WANG. Dam surface disease detection algorithm based on improved YOLOv5[J]. Journal of Computer Applications, 2023, 43(8): 2619-2629.

图/表 29

参考文献 36

1	钱正英. 中国可持续发展水资源战略研究综合报告［C］// 中国水利学会2001学术年会论文集. 北京：中国水利水电出版社， 2001：3-18.
	QIAN Z Y. Comprehensive report of strategic research on sustainable development of water resource in China［C］// Proceedings of the 2001 Annual Academic Conference of Chinese Hydraulic Engineering Society. Beijing： China Water and Power Press， 2001：3-18.
2	刘成栋，向衍，张士辰，等. 水库大坝安全智能巡检系统设计与实现［J］. 中国水利， 2018（20）：39-41. 10.3969/j.issn.1000-1123.2018.20.010
	LIU C D， XIANG Y， ZHANG S C， et al. The design and implementation of intelligent inspection system of reservoir dams based on big data［J］. China Water Resources， 2018（20）：39-41. 10.3969/j.issn.1000-1123.2018.20.010
3	吴中如，顾冲时，沈振中，等. 大坝安全综合分析和评价的理论、方法及其应用［J］. 水利水电科技进展， 1998， 18（3）： 2-6， 65.
	WU Z R， GU C S， SHEN Z Z， et al. Theory and application of dam safety synthetical analysis and assessment［J］. Advances in Science and Technology of Water Resources， 1998， 18（3）：2-6， 65.
4	NISHIKAWA T， YOSHIDA J， SUGIYAMA T， et al. Concrete crack detection by multiple sequential image filtering［J］. Computer-Aided Civil and Infrastructure Engineering， 2012， 27（1）： 29-47. 10.1111/j.1467-8667.2011.00716.x
5	王慧玲，綦小龙，武港山. 基于深度卷积神经网络的目标检测技术的研究进展［J］. 计算机科学， 2018， 45（9）：11-19.
	WANG H L， QI X L， WU G S. Research progress of object detection technology based on convolutional neural network in deep learning［J］. Computer Science， 2018， 45（9）： 11-19.
6	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceeding of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
7	GIRSHICK R. Fast R-CNN［C］// Proceeding of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015：1440-1448. 10.1109/iccv.2015.169
8	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［C］// Proceeding of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015， 1：91-99.
9	TAO X， ZHANG D P， WANG Z H， et al. Detection of power line insulator defects using aerial images analyzed with convolutional neural networks［J］. IEEE Transactions on Systems， Man， and Cybernetics： Systems， 2020， 50（4）： 1486-1498. 10.1109/tsmc.2018.2871750
10	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016：21-37.
11	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016：779-788. 10.1109/cvpr.2016.91
12	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceeding of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
13	REDMON R， FARHIDI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2022-03-20］..
14	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-04-23）［2022-03-23］..
15	Ultralytics. YOLOv5［EB/OL］. ［2022-03-23］.. 10.1117/1.jei.31.3.033033
16	GE Z， LIU S T， WANG F， et al. YOLOX： exceeding YOLO series in 2021［EB/OL］. （2021-08-06）［2022-09-23］..
17	LI C Y， LI L L， JIANG H L， et al. YOLOv6： a single-stage object detection framework for industrial applications［EB/OL］. （2022-09-07）［2022-09-23］..
18	WANG C Y， BOCHKOVSKIY A， LIAO H Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［EB/OL］. （2022-07-06）［2022-09-23］.. 10.48550/arXiv.2207.02696
19	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007. 10.1109/iccv.2017.324
20	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
21	XU X H， ZHENG H， GUO Z Y， et al. SDD-CNN： small data-driven convolution neural networks for subtle roller defect inspection［J］. Applied Sciences， 2019， 9（7）： No.1364. 10.3390/app9071364
22	ZHANG C B， CHANG C C， JAMSHIDI M. Bridge damage detection using a single-stage detector and field inspection images［EB/OL］. （2019-02-23）［2022-09-21］.. 10.1111/mice.12500
23	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
24	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale［EB/OL］. （2021-06-03）［2022-04-12］..
25	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision transformer using shifted windows［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 10012-10022. 10.1109/iccv48922.2021.00986
26	CHEN C F R， FAN Q F， PANDA R. CrossViT： cross-attention multi-scale vision transformer for image classification［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 347-356. 10.1109/iccv48922.2021.00041
27	HOU Q B， ZHOU D Q， FENG J S. Coordinate attention for efficient mobile network design［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13708-13717. 10.1109/cvpr46437.2021.01350
28	ZHANG S F， CHI C， YAO Y Q， et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 9756-9765. 10.1109/cvpr42600.2020.00978
29	ZHENG Z H， WANG P， LIU W， et al. Distance-IoU loss： faster and better learning for bounding box regression［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 12993-13000. 10.1609/aaai.v34i07.6999
30	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
31	HARTIGAN J A， WONG M A. Algorithm AS 136： a K-means clustering algorithm［J］. Journal of the Royal Statistical Society， Series C （Applied Statistics）， 1979， 28（1）： 100-108. 10.2307/2346830
32	HAN H， WANG W Y， MAO B H. Borderline-SMOTE： a new over-sampling method in imbalanced data sets learning［C］// Proceedings of the 2005 International Conference on Intelligent Computing， LNCS 3644. Berlin： Springer， 2005： 878-887.
33	DING X H， ZHANG X Y， HAN J G， et al. Diverse branch block： building a convolution as an inception-like unit［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10881-10890. 10.1109/cvpr46437.2021.01074
34	HOWARD A G， ZHU M， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications［EB/OL］. （2017-04-17）［2022-04-12］.. 10.48550/arXiv.1704.04861
35	HAN K， WANG Y H， TIAN Q， et al. GhostNet： more features from cheap operations［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1577-1586. 10.1109/cvpr42600.2020.00165
36	ZHANG X Y， ZHOU X Y， LIN M X， et al. ShuffleNet： an extremely efficient convolutional neural network for mobile devices［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6848-6856. 10.1109/cvpr.2018.00716

类别	训练集	验证集	测试集	类别	训练集	验证集	测试集
seepage	1 473	491	491	crack01	1 344	448	448
crack00	1 496	498	498	crack02	1 391	464	464

类别	训练集	验证集	测试集	类别	训练集	验证集	测试集
seepage	1 473	491	491	crack01	1 344	448	448
crack00	1 496	498	498	crack02	1 391	464	464

模型	AP/%				mAP%
模型	seepage	crack00	crack01	crack02	mAP%
YOLOv5s	61.7	85.7	43.8	89.1	70.1
YOLO-MT-CA	71.9	86.4	54.9	98.9	78.3

模型	AP/%				mAP%
模型	seepage	crack00	crack01	crack02	mAP%
YOLOv5s	61.7	85.7	43.8	89.1	70.1
YOLO-MT-CA	71.9	86.4	54.9	98.9	78.3

模型	AP				mAP
模型	seepage	crack00	crack01	crack02	mAP
YOLO-MT-CA	71.9	86.4	54.9	98.9	78.3
BPSS	72.1	87.0	65.1	99.1	80.8

基于改进的YOLOv5的大坝表面病害检测算法

Dam surface disease detection algorithm based on improved YOLOv5

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 29

参考文献 36

相关文章 15

编辑推荐

Metrics

实验轮次	数据增强	MT	CA	BPSS	轻量化	模型大小/MB	计算量/GFLOPs	mAP/%	准确率/%	召回率/%	帧速率/（frame·s^-1）
1						14.4	15.9	64.3	67.9	62.7	43
2	√					14.4	15.9	70.1	74.5	69.8	43
3	√	√				17.6	16.6	74.9	78.7	73.7	38
4	√	√	√			17.9	16.7	78.3	79.4	74.4	38
5	√	√	√	√		17.9	16.7	80.8	81.0	75.9	38
6	√	√	√	√	√	13.6	14.5	80.6	80.5	75.8	54

序号	当前层的输入层	模块数量	参数量	模块名称	参数设置
0	-1	1	3 520	Conv	［3， 32， 6， 2， 2］
1	-1	1	20 736	Rep_Block	［32， 64， 3， 2］
2	-1	1	18 816	C3	［64， 64， 1］
3	-1	1	82 432	Rep_Block	［64， 128， 3， 2］
4	-1	1	74 560	MT_Block	［128， 128， 1］
5	-1	1	328 704	Rep_Block	［128， 256， 3， 2］
6	-1	1	296 576	MT_Block	［256， 256， 1］
7	-1	1	1 312 768	Rep_Block	［256， 512， 3， 2］
8	-1	1	1 182 976	MT_Block	［512， 512， 1］
9	-1	1	656 896	SPPF	［512， 512， 5］
10	-1	1	131 584	Conv	［512， 256， 1， 1］
11	-1	1	0	Upsampling	［None， 2， 'nearest'］
12	［1，6］	1	0	Concat	［1］
13	-1	1	361 984	C3	［512， 256， 1， False］
14	-1	1	33 024	Conv	［256， 128， 1， 1］
15	-1	1	0	Upsampling	［None， 2， 'nearest'］
16	［1， 4］	1	0	Concat	［1］
17	-1	1	90 880	C3	［256， 128， 1， False］
18	-1	1	6 448	CA_Block	128， 128， 8］
19	-1	1	147 712	Conv	［128， 128， 3， 2］
20	［1， 14］	1	0	Concat	［1］
21	-1	1	296 448	C3	［256， 256， 1， False］
22	-1	1	12 848	CA_Block	256， 256， 16］
23	-1	1	590 336	Conv	［256， 256， 3， 2］
24	［1， 10］	1	0	Concat	［1］
25	-1	1	1 182 720	C3	［512， 512， 1， False］
26	-1	1	25 648	CA_Block	［512，512，32］
27	［18，22，26］	1	24 273	Detect	［4，［［10，13，16，30，33，23］，［30，61，62，45，59，119］，［116，90，156，198，373，326］］，［128，256，512］］

无人机到坝面/坝顶的距离/m	每帧图像覆盖面积/%	查全率/%	准确率/%	F1-score
［10，20）	11	82.7	91.7	0.870
［20，30）	20	81.6	91.2	0.861
［30，40）	52	75.3	88.2	0.812
［40，50）	97	58.1	69.6	0.633

无人机到防浪墙等区域距离/m	每帧图像覆盖面积/%	查全率/%	准确率/%	F1-score
［5，10）	7	87.6	93.2	0.903
［10，15）	10	87.4	93.1	0.902
［15，20）	15	75.8	89.6	0.821
［20，25）	32	61.9	83.2	0.710

序号	模型	输入尺寸	mAP@0.5	mAP@0.5：0.95	模型大小/MB	计算量/GFLOPs	帧速率/（frame·s^-1）
1	Faster R-CNN	640×640	0.727	0.424	315.00	276.1	5
2	RetinaNet	640×640	0.739	0.453	277.20	220.0	26
3	Casecade-swinTransformer	640×640	0.772	0.499	219.00	267.0	12
4	SSD-Lite	640×640	0.713	0.404	6.30	7.6	63
5	EfficientDet	640×640	0.724	0.398	15.10	16.9	25
6	YOLOX-s	640×640	0.696	0.315	17.70	26.8	33
7	PPYOLOE-s	640×640	0.717	0.376	15.90	17.4	51
8	YOLOV6s	640×640	0.731	0.372	28.40	44.2	37
9	YOLOV7-tiny	640×640	0.718	0.353	12.10	13.9	61
10	YOLOv3-tiny	640×640	0.695	0.322	16.60	12.9	56
11	YOLOv5s-mobilenetv3^［34］	640×640	0.688	0.356	6.90	7.9	57
12	YOLOv5s-ghostnet^［35］	640×640	0.719	0.443	7.34	8.1	42
13	YOLOv5s-shufflenet^［36］	640×640	0.687	0.363	3.30	4.7	78
14	YOLOv5s-Transformer	640×640	0.741	0.446	15.60	17.6	40
15	YOLOv5s	640×640	0.701	0.328	14.40	16.6	43
16	YOLOv5s6	640×640	0.729	0.411	23.80	16.2	41
17	本文模型	640×640	0.806	0.501	13.60	14.5	54

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587.
[3]	张英俊, 李牛牛, 谢斌红, 张睿, 陆望东. 课程学习指导下的半监督目标检测框架[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2326-2333.
[4]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[5]	孙逊, 冯睿锋, 陈彦如. 基于深度与实例分割融合的单目3D目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2208-2215.
[6]	姬张建, 杜娜. 基于改进VariFocalNet的微小目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2200-2207.
[7]	刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977.
[8]	邓亚平, 李迎江. YOLO算法及其在自动驾驶场景中目标检测综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1949-1958.
[9]	耿焕同, 刘振宇, 蒋骏, 范子辰, 李嘉兴. 基于改进YOLOv8的嵌入式道路裂缝检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1613-1618.
[10]	宋霄罡, 张冬冬, 张鹏飞, 梁莉, 黑新宏. 面向复杂施工环境的实时目标检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1605-1612.
[11]	李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1619-1628.
[12]	李鸿天, 史鑫昊, 潘卫国, 徐成, 徐冰心, 袁家政. 融合多尺度和注意力机制的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1437-1444.
[13]	陈天华, 朱家煊, 印杰. 基于注意力机制的鸟类识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1114-1120.
[14]	王伟, 赵春辉, 唐心瑶, 席刘钢. 自适应地平线约束下的车辆三维检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 909-915.
[15]	郑宇亮, 陈云华, 白伟杰, 陈平华. 融合事件数据和图像帧的车辆目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 931-937.