Lightweight network for rebar detection with attention mechanism

doi:10.11772/j.issn.1001-9081.2021071136

Abstract

Abstract:

There are limited memory and computing power of the equipment in smart construction sites， making it very difficult to detect rebar in real time through object detection on the on-site equipment. The slow speed of rebar detection and the high cost of model deployment of this equipment also bring great challenges. In order to solve the problems， RebarNet， a lightweight network for rebar detection with attention mechanism was proposed on the basis of YOLOv3 （You Only Look Once version 3）. Firstly， the residual block was used as the basic unit of the network to construct a feature extraction structure to extract local and contextual information. Secondly， Channel Attention （CA） module and Spatial Attention （SA） module were added to the residual block to adjust the attention weight of the feature map and improve the ability of the network to extract features. Thirdly， the feature pyramid fusion module was used to increase the receptive field of the network and optimize the extraction effect of the medium-sized rebar images. Finally， the feature map of 52×52 channel was output for post-processing and rebar detection after 8 times downsampling. Experimental results show that the parameter amount of the proposed network is only 5% of that of Darknet53 network， and mAP （mean Average Precision） of the proposed network achieves 92.7% at the speed of 106.8 FPS （Frames Per Second） on the rebar test dataset. Compared with the existing 8 object detection networks including EfficientDet （Scalable and Efficient Object Detection）， SSD （Single Shot MultiBox Detector）， CenterNet， RetinaNet， Faster RCNN （Faster Region-CNN）， YOLOv3， YOLOv4 and YOLOv5m （YOLOv5 medium）， RebarNet has a shorter training time （24.5 seconds）， the lowest memory usage （1 956 MB）， and the smallest model weight file （13 MB）. Compared with the current best-performing YOLOv5m network， RebarNet has the mAP slightly lower by 0.4 percentage points with the detection speed increased by 48 FPS， which is 1.8 times of that of YOLOv5m network. The above indicates that the proposed network helps to complete the task of high-efficiency and accurate rebar detection in smart construction sites.

Key words: rebar detection, YOLOv3, attention mechanism, feature pyramid, lightweight network

摘要：

智慧工地中的设备内存和计算能力有限，在现场的设备上通过目标检测对钢筋进行实时检测具有很大的难度，而且其钢筋检测速度慢、模型部署成本高。针对这些问题，在YOLOv3网络的基础上，提出了一个嵌入注意力机制的轻量级钢筋检测网络RebarNet。首先，利用残差块作为网络的基本单元来构建特征提取结构，并用其提取局部和上下文信息；其次，在残差块中添加通道注意力（CA）模块和空间注意力（SA）模块，以调整特征图的注意力权重，并提升网络提取特征的能力；然后，采用特征金字塔融合模块，以增大网络的感受野，并优化中等钢筋图像的提取效果；最后，输出经过8倍下采样后的52×52通道的特征图用于后处理和钢筋检测。实验结果表明，所提网络的参数量仅为Darknet53网络的5%，在钢筋测试集上以106.8 FPS的速度达到了92.7%的mAP。与现有的EfficientDet、SSD、CenterNet、RetinaNet、Faster RCNN、YOLOv3、YOLOv4和YOLOv5m等8个目标检测网络相比，RebarNet具有更短的训练时间（24.5 s）、最低的显存占用（1 956 MB）、最小的模型权重文件（13 MB）。与目前效果最好的YOLOv5m网络相比，RebarNet的mAP略低0.4个百分点，然而其检测速度上升了48 FPS，是YOLOv5m网络的1.8倍。以上结果表明，所提出的网络有助于完成智慧工地中要求实现的高效、准确的钢筋检测任务。

关键词: 钢筋检测, YOLOv3, 注意力机制, 特征金字塔, 轻量级网络

CLC Number:

TP399

Yaoshun LI, Lizhi LIU. Lightweight network for rebar detection with attention mechanism[J]. Journal of Computer Applications, 2022, 42(9): 2900-2908.

李姚舜, 刘黎志. 嵌入注意力机制的轻量级钢筋检测网络[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2900-2908.

Figures/Tables 12

Fig. 1 Structure of Darknet-53

Fig. 2 Schematic diagram of feature map prediction process

Tab. 1 Anchors of YOLOv3

通道	Anchor框
13 $×$ 13	［116， 90］，［156， 198］，［373， 326］
26 $×$ 26	［30， 61］，［62， 45］，［59， 119］
52 $×$ 52	［10， 13］，［16， 30］，［33， 23］

Tab. 1 Anchors of YOLOv3

通道	Anchor框
13 $×$ 13	［116， 90］，［156， 198］，［373， 326］
26 $×$ 26	［30， 61］，［62， 45］，［59， 119］
52 $×$ 52	［10， 13］，［16， 30］，［33， 23］

Fig. 3 Distribution of width and height of rebar

Fig. 4 Heatmap of 52×52 channel in different rebar detection networks

Fig. 5 Backbone structure of the proposed model

Fig. 6 Flowchart of rebar detection by RebarNet network

Tab. 2 Comparison of model parameters

目标检测网络	卷积层数	总训练参数/10⁴	占用显存/MB	模型权重/MB
EfficientDet	176	359	2 927	15
SSD	35	2 374	3 146	91
CenterNet	62	3 266	2 166	124
RetinaNet	53	2 350	3 919	138
Faster RCNN	43	854	5 230	108
YOLOv3	75	6 157	5 192	235
YOLOv4	182	6 393	4 184	244
YOLOv5m	94	2 156	4 876	57
本文网络	30	347	1 956	13

Tab. 3 Partition and usage of dataset

数据集	图片数量	标记文件	用途
Train	225	train.txt	模型训练
Val	25	val.txt	模型训练中mAP计算
Test	200		手工点数，用于模型Accuracy、FPS评价

Fig. 7 Data augmentation

Tab. 4 Evaluation indexes of different networks

目标检测网络	TrainTime/s	mAP	Accuracy	FPS/（frame∙s^-1）
EfficientDet	21.5	0.010	0.056	17.3
SSD	20.5	0.117	0.227	45.7
CenterNet	23.8	0.123	0.278	43.4
RetinaNet	36.2	0.462	0.504	21.8
Faster RCNN	97.8	0.682	0.517	11.2
YOLOv3	43.4	0.889	0.887	38.2
YOLOv4	27.5	0.916	0.923	26.8
YOLOv5m	31.1	0.931	0.933	58.1
本文网络	24.5	0.927	0.931	106.8

Fig. 8 Actual detection effect on Test dataset

References 38

1	刘士林，毕晓航，赖丹馨. 战“疫”：智慧城市显身手［J］. 中国建设信息化， 2020（7）： 22-24.
	LIU S L， BI X H， LAI D X. Fighting the epidemic： smart cities play a huge role［J］. Informatization of China Construction， 2020（7）： 22-24.
2	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
3	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
4	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149. 10.1109/tpami.2016.2577031
5	HE K M， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
6	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
7	FU C Y， LIU W， RANGA A， et al. DSSD： deconvolutional single shot detector［EB/OL］. （2017-01-23）［2021-06-20］..
8	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
9	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
10	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2021-06-15］..
11	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-04-23）［2021-06-20］..
12	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007. 10.1109/iccv.2017.324
13	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
14	LAW H， DENG J. CornerNet： detecting objects as paired keypoints［J］. International Journal of Computer Vision， 2020， 128（3）： 642-656. 10.1007/s11263-019-01204-1
15	DUAN K W， BAI S， XIE L X， et al. CenterNet： keypoint triplets for object detection［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6568-6577. 10.1109/iccv.2019.00667
16	LAW H， TENG Y， RUSSAKOVSKY O， et al. CornerNet-Lite： efficient keypoint based object detection［C］// Proceedings of the 2020 British Machine Vision Conference. Durham： BMVA Press， 2020： No.16.
17	TIAN Z， SHEN C H， CHEN H， et al. FCOS： fully convolutional one-stage object detection［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2020： 9626-9635. 10.1109/iccv.2019.00972
18	ZHOU X Y， WANG D Q， KRÄHENBÜHL P. Objects as points［EB/OL］. （2019-04-25）［2021-06-13］..
19	ZHANG Y， TIŇO P， LEONARDIS A， et al. A survey on neural network interpretability［J］. IEEE Transactions on Emerging Topics in Computational Intelligence， 2021， 5（5）： 726-742. 10.1109/tetci.2021.3100641
20	ITTI L， KOCH C， NIEBUR E. A model of saliency-based visual attention for rapid scene analysis［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 1998， 20（11）： 1254-1259. 10.1109/34.730558
21	LIU N， HAN J W. DHSNet： deep hierarchical saliency network for salient object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 678-686. 10.1109/cvpr.2016.80
22	FENG M Y， LU H C， DING E R. Attentive feedback network for boundary-aware salient object detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1623-1632. 10.1109/cvpr.2019.00172
23	HU J， LI S， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
24	LI X， WANG W H， HU X L， et al. Selective kernel networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 510-519. 10.1109/cvpr.2019.00060
25	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
26	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
27	PARMAR N， VASWANI A， USZKOREIT J， et al. Image transformer［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 4055-4064.
28	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： transformers for image recognition at scale［EB/OL］. （2021-06-03）［2021-06-25］..
29	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.
30	BEAL J， KIM E， TZENG E， et al. Toward transformer-based object detection［EB/OL］. （2020-12-17）［2021-06-25］..
31	ZHANG X M， MA M， HE T T， et al. Steel bars counting method based on image and video processing［C］// Proceedings of the 2017 International Symposium on Intelligent Signal Processing and Communication Systems. Piscataway： IEEE， 2017： 304-309. 10.1109/ispacs.2017.8266493
32	WANG H， POLDEN J， JIRGENS J， et al. Automatic rebar counting using image processing and machine learning［C］// Proceedings of the IEEE 9th Annual International Conference on Cyber Technology in Automation， Control， and Intelligent Systems. Piscataway： IEEE， 2019： 900-904. 10.1109/cyber46603.2019.9066509
33	刘赛，李兴璨，李航，等. 基于AI技术的钢筋数量识别技术研究［J］. 居舍， 2020（6）：27.
	LIU S， LI X C， LI H， et al. Research on rebar number identification technology based on AI technology［J］. Housing， 2020（6）： 27.
34	QU F， LI C M， PENG K， et al. Research on detection and identification of dense rebar based on lightweight network［C］// Proceedings of the 2020 International Conference of Pioneering Computer Scientists， Engineers and Educators， CCIS 1257. Singapore： Springer， 2020： 440-446.
35	ZHU Y J， TANG C L， LIU H， et al. End-face localization and segmentation of steel bar based on convolution neural network［J］. IEEE Access， 2020， 8： 74679-74690. 10.1109/access.2020.2989300
36	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
37	NEUBECK A， VAN GOOL L. Efficient non-maximum suppression［C］// Proceedings of the 18th International Conference on Pattern Recognition. Piscataway： IEEE， 2006： 850-855. 10.1109/icpr.2006.479
38	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[4]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[5]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[6]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[7]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[8]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[9]	Zhangjian JI, Na DU. Tiny target detection based on improved VariFocalNet [J]. Journal of Computer Applications, 2024, 44(7): 2200-2207.
[10]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[11]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[12]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[13]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[14]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[15]	Yongjin ZHANG, Jian XU, Mingxing ZHANG. Lightweight algorithm for impurity detection in raw cotton based on improved YOLOv7 [J]. Journal of Computer Applications, 2024, 44(7): 2271-2278.