嵌入注意力机制的轻量级钢筋检测网络

doi:10.11772/j.issn.1001-9081.2021071136

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (9): 2900-2908.DOI: 10.11772/j.issn.1001-9081.2021071136

• 多媒体计算与计算机仿真 • 上一篇下一篇

嵌入注意力机制的轻量级钢筋检测网络

李姚舜, 刘黎志()

智能机器人湖北省重点实验室（武汉工程大学），武汉 430205

收稿日期:2021-07-01 修回日期:2021-09-13 接受日期:2021-09-15 发布日期:2021-09-22 出版日期:2022-09-10
通讯作者: 刘黎志
作者简介:李姚舜（1998—），男，湖北荆州人，硕士研究生，主要研究方向：深度学习、目标检测；
基金资助:
湖北省教育厅科学研究计划指导性项目(B2017051);智能机器人湖北省重点实验室开放基金资助项目(HBIRL202002)

Lightweight network for rebar detection with attention mechanism

Yaoshun LI, Lizhi LIU()

Hubei Key Laboratory of Intelligent Robot （Wuhan Institute of Technology），Wuhan Hubei 430205，China

Received:2021-07-01 Revised:2021-09-13 Accepted:2021-09-15 Online:2021-09-22 Published:2022-09-10
Contact: Lizhi LIU
About author:LI Yaoshun， born in 1998， M. S. candidate. His research interests include deep learning， object detection.
Supported by:
Guidance Project of Scientific Research Plan of Hubei Provincial Department of Education(B2017051);Open Fund of Hubei Key Laboratory of Intelligent Robot(HBIRL202002)

摘要/Abstract

摘要：

智慧工地中的设备内存和计算能力有限，在现场的设备上通过目标检测对钢筋进行实时检测具有很大的难度，而且其钢筋检测速度慢、模型部署成本高。针对这些问题，在YOLOv3网络的基础上，提出了一个嵌入注意力机制的轻量级钢筋检测网络RebarNet。首先，利用残差块作为网络的基本单元来构建特征提取结构，并用其提取局部和上下文信息；其次，在残差块中添加通道注意力（CA）模块和空间注意力（SA）模块，以调整特征图的注意力权重，并提升网络提取特征的能力；然后，采用特征金字塔融合模块，以增大网络的感受野，并优化中等钢筋图像的提取效果；最后，输出经过8倍下采样后的52×52通道的特征图用于后处理和钢筋检测。实验结果表明，所提网络的参数量仅为Darknet53网络的5%，在钢筋测试集上以106.8 FPS的速度达到了92.7%的mAP。与现有的EfficientDet、SSD、CenterNet、RetinaNet、Faster RCNN、YOLOv3、YOLOv4和YOLOv5m等8个目标检测网络相比，RebarNet具有更短的训练时间（24.5 s）、最低的显存占用（1 956 MB）、最小的模型权重文件（13 MB）。与目前效果最好的YOLOv5m网络相比，RebarNet的mAP略低0.4个百分点，然而其检测速度上升了48 FPS，是YOLOv5m网络的1.8倍。以上结果表明，所提出的网络有助于完成智慧工地中要求实现的高效、准确的钢筋检测任务。

关键词: 钢筋检测, YOLOv3, 注意力机制, 特征金字塔, 轻量级网络

Abstract:

There are limited memory and computing power of the equipment in smart construction sites， making it very difficult to detect rebar in real time through object detection on the on-site equipment. The slow speed of rebar detection and the high cost of model deployment of this equipment also bring great challenges. In order to solve the problems， RebarNet， a lightweight network for rebar detection with attention mechanism was proposed on the basis of YOLOv3 （You Only Look Once version 3）. Firstly， the residual block was used as the basic unit of the network to construct a feature extraction structure to extract local and contextual information. Secondly， Channel Attention （CA） module and Spatial Attention （SA） module were added to the residual block to adjust the attention weight of the feature map and improve the ability of the network to extract features. Thirdly， the feature pyramid fusion module was used to increase the receptive field of the network and optimize the extraction effect of the medium-sized rebar images. Finally， the feature map of 52×52 channel was output for post-processing and rebar detection after 8 times downsampling. Experimental results show that the parameter amount of the proposed network is only 5% of that of Darknet53 network， and mAP （mean Average Precision） of the proposed network achieves 92.7% at the speed of 106.8 FPS （Frames Per Second） on the rebar test dataset. Compared with the existing 8 object detection networks including EfficientDet （Scalable and Efficient Object Detection）， SSD （Single Shot MultiBox Detector）， CenterNet， RetinaNet， Faster RCNN （Faster Region-CNN）， YOLOv3， YOLOv4 and YOLOv5m （YOLOv5 medium）， RebarNet has a shorter training time （24.5 seconds）， the lowest memory usage （1 956 MB）， and the smallest model weight file （13 MB）. Compared with the current best-performing YOLOv5m network， RebarNet has the mAP slightly lower by 0.4 percentage points with the detection speed increased by 48 FPS， which is 1.8 times of that of YOLOv5m network. The above indicates that the proposed network helps to complete the task of high-efficiency and accurate rebar detection in smart construction sites.

Key words: rebar detection, YOLOv3, attention mechanism, feature pyramid, lightweight network

中图分类号:

TP399

李姚舜, 刘黎志. 嵌入注意力机制的轻量级钢筋检测网络[J]. 计算机应用, 2022, 42(9): 2900-2908.

Yaoshun LI, Lizhi LIU. Lightweight network for rebar detection with attention mechanism[J]. Journal of Computer Applications, 2022, 42(9): 2900-2908.

图/表 12

图1 Darknet-53结构

Fig. 1 Structure of Darknet-53

图2 特征图预测过程示意图

Fig. 2 Schematic diagram of feature map prediction process

表1 YOLOv3设置Anchor框

Tab. 1 Anchors of YOLOv3

通道	Anchor框
13 $×$ 13	［116， 90］，［156， 198］，［373， 326］
26 $×$ 26	［30， 61］，［62， 45］，［59， 119］
52 $×$ 52	［10， 13］，［16， 30］，［33， 23］

表1 YOLOv3设置Anchor框

Tab. 1 Anchors of YOLOv3

通道	Anchor框
13 $×$ 13	［116， 90］，［156， 198］，［373， 326］
26 $×$ 26	［30， 61］，［62， 45］，［59， 119］
52 $×$ 52	［10， 13］，［16， 30］，［33， 23］

图3 钢筋宽高分布情况

Fig. 3 Distribution of width and height of rebar

图4 不同钢筋检测网络52×52通道的Heatmap

Fig. 4 Heatmap of 52×52 channel in different rebar detection networks

图5 本文模型的骨干网络结构

Fig. 5 Backbone structure of the proposed model

图6 RebarNet网络检测钢筋流程

Fig. 6 Flowchart of rebar detection by RebarNet network

表2 模型参数量对比

Tab. 2 Comparison of model parameters

目标检测网络	卷积层数	总训练参数/10⁴	占用显存/MB	模型权重/MB
EfficientDet	176	359	2 927	15
SSD	35	2 374	3 146	91
CenterNet	62	3 266	2 166	124
RetinaNet	53	2 350	3 919	138
Faster RCNN	43	854	5 230	108
YOLOv3	75	6 157	5 192	235
YOLOv4	182	6 393	4 184	244
YOLOv5m	94	2 156	4 876	57
本文网络	30	347	1 956	13

表3 数据集划分及用途

Tab. 3 Partition and usage of dataset

数据集	图片数量	标记文件	用途
Train	225	train.txt	模型训练
Val	25	val.txt	模型训练中mAP计算
Test	200		手工点数，用于模型Accuracy、FPS评价

图7 数据增强

Fig. 7 Data augmentation

表4 不同网络的评测指标

Tab. 4 Evaluation indexes of different networks

目标检测网络	TrainTime/s	mAP	Accuracy	FPS/（frame∙s^-1）
EfficientDet	21.5	0.010	0.056	17.3
SSD	20.5	0.117	0.227	45.7
CenterNet	23.8	0.123	0.278	43.4
RetinaNet	36.2	0.462	0.504	21.8
Faster RCNN	97.8	0.682	0.517	11.2
YOLOv3	43.4	0.889	0.887	38.2
YOLOv4	27.5	0.916	0.923	26.8
YOLOv5m	31.1	0.931	0.933	58.1
本文网络	24.5	0.927	0.931	106.8

图8 Test数据集上的钢筋实际检测效果

Fig. 8 Actual detection effect on Test dataset

参考文献 38

1	刘士林，毕晓航，赖丹馨. 战“疫”：智慧城市显身手［J］. 中国建设信息化， 2020（7）： 22-24.
	LIU S L， BI X H， LAI D X. Fighting the epidemic： smart cities play a huge role［J］. Informatization of China Construction， 2020（7）： 22-24.
2	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
3	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
4	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149. 10.1109/tpami.2016.2577031
5	HE K M， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
6	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
7	FU C Y， LIU W， RANGA A， et al. DSSD： deconvolutional single shot detector［EB/OL］. （2017-01-23）［2021-06-20］..
8	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
9	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
10	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2021-06-15］..
11	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-04-23）［2021-06-20］..
12	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007. 10.1109/iccv.2017.324
13	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
14	LAW H， DENG J. CornerNet： detecting objects as paired keypoints［J］. International Journal of Computer Vision， 2020， 128（3）： 642-656. 10.1007/s11263-019-01204-1
15	DUAN K W， BAI S， XIE L X， et al. CenterNet： keypoint triplets for object detection［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6568-6577. 10.1109/iccv.2019.00667
16	LAW H， TENG Y， RUSSAKOVSKY O， et al. CornerNet-Lite： efficient keypoint based object detection［C］// Proceedings of the 2020 British Machine Vision Conference. Durham： BMVA Press， 2020： No.16.
17	TIAN Z， SHEN C H， CHEN H， et al. FCOS： fully convolutional one-stage object detection［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2020： 9626-9635. 10.1109/iccv.2019.00972
18	ZHOU X Y， WANG D Q， KRÄHENBÜHL P. Objects as points［EB/OL］. （2019-04-25）［2021-06-13］..
19	ZHANG Y， TIŇO P， LEONARDIS A， et al. A survey on neural network interpretability［J］. IEEE Transactions on Emerging Topics in Computational Intelligence， 2021， 5（5）： 726-742. 10.1109/tetci.2021.3100641
20	ITTI L， KOCH C， NIEBUR E. A model of saliency-based visual attention for rapid scene analysis［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 1998， 20（11）： 1254-1259. 10.1109/34.730558
21	LIU N， HAN J W. DHSNet： deep hierarchical saliency network for salient object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 678-686. 10.1109/cvpr.2016.80
22	FENG M Y， LU H C， DING E R. Attentive feedback network for boundary-aware salient object detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1623-1632. 10.1109/cvpr.2019.00172
23	HU J， LI S， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
24	LI X， WANG W H， HU X L， et al. Selective kernel networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 510-519. 10.1109/cvpr.2019.00060
25	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
26	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
27	PARMAR N， VASWANI A， USZKOREIT J， et al. Image transformer［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 4055-4064.
28	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： transformers for image recognition at scale［EB/OL］. （2021-06-03）［2021-06-25］..
29	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.
30	BEAL J， KIM E， TZENG E， et al. Toward transformer-based object detection［EB/OL］. （2020-12-17）［2021-06-25］..
31	ZHANG X M， MA M， HE T T， et al. Steel bars counting method based on image and video processing［C］// Proceedings of the 2017 International Symposium on Intelligent Signal Processing and Communication Systems. Piscataway： IEEE， 2017： 304-309. 10.1109/ispacs.2017.8266493
32	WANG H， POLDEN J， JIRGENS J， et al. Automatic rebar counting using image processing and machine learning［C］// Proceedings of the IEEE 9th Annual International Conference on Cyber Technology in Automation， Control， and Intelligent Systems. Piscataway： IEEE， 2019： 900-904. 10.1109/cyber46603.2019.9066509
33	刘赛，李兴璨，李航，等. 基于AI技术的钢筋数量识别技术研究［J］. 居舍， 2020（6）：27.
	LIU S， LI X C， LI H， et al. Research on rebar number identification technology based on AI technology［J］. Housing， 2020（6）： 27.
34	QU F， LI C M， PENG K， et al. Research on detection and identification of dense rebar based on lightweight network［C］// Proceedings of the 2020 International Conference of Pioneering Computer Scientists， Engineers and Educators， CCIS 1257. Singapore： Springer， 2020： 440-446.
35	ZHU Y J， TANG C L， LIU H， et al. End-face localization and segmentation of steel bar based on convolution neural network［J］. IEEE Access， 2020， 8： 74679-74690. 10.1109/access.2020.2989300
36	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
37	NEUBECK A， VAN GOOL L. Efficient non-maximum suppression［C］// Proceedings of the 18th International Conference on Pattern Recognition. Piscataway： IEEE， 2006： 850-855. 10.1109/icpr.2006.479
38	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079

[1]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[2]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[3]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[4]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[5]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[6]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[7]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[8]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[9]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[10]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[11]	李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182.
[12]	魏文亮, 王阳萍, 岳彪, 王安政, 张哲. 基于光照权重分配和注意力的红外与可见光图像融合深度学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2183-2191.
[13]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[14]	姬张建, 杜娜. 基于改进VariFocalNet的微小目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2200-2207.
[15]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.

嵌入注意力机制的轻量级钢筋检测网络

Lightweight network for rebar detection with attention mechanism

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 38

相关文章 15

编辑推荐

Metrics