Real-time object detection algorithm for complex construction environments

doi:10.11772/j.issn.1001-9081.2023050687

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (5): 1605-1612.DOI: 10.11772/j.issn.1001-9081.2023050687

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles Next Articles

Real-time object detection algorithm for complex construction environments

Xiaogang SONG¹^,²(), Dongdong ZHANG¹, Pengfei ZHANG¹, Li LIANG¹, Xinhong HEI¹^,²

^1.Faculty of Computer Science and Engineering，Xi’an University of Technology，Xi’an Shaanxi 710048，China
^2.Human Machine Integration Intelligent Robot Shaanxi Provincial University Engineering Research Center，Xi’an Shaanxi 710048，China

Received:2023-05-30 Revised:2023-09-12 Accepted:2023-09-14 Online:2023-09-19 Published:2024-05-10
Contact: Xiaogang SONG
About author:ZHANG Dongdong， born in 1998， M. S. candidate. His research interests include object detection， video action recognition.
ZHANG Pengfei， born in 1998， M. S. candidate. His research interests include object detection， camouflage object detection.
LIANG Li， born in 1964， M. S.， associate professor. Her research interests include deep learning， machine vision.
HEI Xinhong， born in 1976， Ph. D.， professor. His research interests include artificial intelligence， intelligent construction.
Supported by:
National Key R&D Program of China(2022YFB2602203)

面向复杂施工环境的实时目标检测算法

宋霄罡¹^,²(), 张冬冬¹, 张鹏飞¹, 梁莉¹, 黑新宏¹^,²

^1.西安理工大学计算机科学与工程学院，西安 710048
^2.人机共融智能机器人陕西省高校工程研究中心，西安 710048

通讯作者: 宋霄罡
作者简介:张冬冬（1998—），男，湖南郴州人，硕士研究生，主要研究方向：目标检测、视频行为识别
张鹏飞（1998—），男，河南三门峡人，硕士研究生，主要研究方向：目标检测、伪装物体检测
梁莉（1964—），女，陕西西安人，副教授，硕士，主要研究方向：深度学习、机器视觉
黑新宏（1976—），男，陕西西安人，教授，博士生导师，博士，CCF杰出会员，主要研究方向：人工智能、智能建造。
第一联系人：宋霄罡（1987—），男，河南漯河人，副教授，博士，主要研究方向：计算机视觉、无人系统自主导航
基金资助:
国家重点研发计划项目(2022YFB2602203)

Abstract

Abstract:

A real-time object detection algorithm YOLO-C for complex construction environment was proposed for the problems of cluttered environment， obscured objects， large object scale range， unbalanced positive and negative samples， and insufficient real-time of existing detection algorithms， which commonly exist in construction environment. The extracted low-level features were fused with the high-level features to enhance the global sensing capability of the network， and a small object detection layer was designed to improve the detection accuracy of the algorithm for objects of different scales. A Channel-Spatial Attention （CSA） module was designed to enhance the object features and suppress the background features. In the loss function part， VariFocal Loss was used to calculate the classification loss to solve the problem of positive and negative sample imbalance. GhostConv was used as the basic convolutional block to construct the GCSP （Ghost Cross Stage Partial） structure to reduce the number of parameters and the amount of computation. For complex construction environments， a concrete construction site object detection dataset was constructed， and comparison experiments for various algorithms were conducted on the constructed dataset. Experimental results demonstrate that the YOLO?C has higher detection accuracy and smaller parameters， making it more suitable for object detection tasks in complex construction environments.

Key words: real-time object detection, YOLOv5s, concrete construction site, attention mechanism, lightweight

摘要：

针对施工环境下普遍存在的环境杂乱、目标被遮挡、目标尺度范围大、正负样本不平衡、现有检测算法实时性不足等问题，提出一种面向复杂施工环境的实时目标检测算法YOLO-C。将提取到的低层特征与高层特征相融合，增强网络全局感知能力；设计小目标检测层，提高算法对不同尺度目标的检测精度；设计通道-空间注意力（CSA）模块，增强目标特征，抑制背景特征；在损失函数部分，采用VariFocal Loss计算分类损失，解决正负样本不平衡问题；GhostConv作为基本卷积块构建GCSP（Ghost Cross Stage Partial）结构，降低参数量和计算量；针对复杂施工环境，构建混凝土施工现场目标检测数据集，在构建的数据集上与多个算法进行对比分析实验。实验结果表明，YOLO-C算法的检测精度更高，参数量更小，更适合复杂施工环境下的目标检测任务。

关键词: 实时目标检测, YOLOv5s, 混凝土施工现场, 注意力机制, 轻量化

CLC Number:

TP391.4

Xiaogang SONG, Dongdong ZHANG, Pengfei ZHANG, Li LIANG, Xinhong HEI. Real-time object detection algorithm for complex construction environments[J]. Journal of Computer Applications, 2024, 44(5): 1605-1612.

宋霄罡, 张冬冬, 张鹏飞, 梁莉, 黑新宏. 面向复杂施工环境的实时目标检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1605-1612.

Figures/Tables 12

References 22

1	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［J］. Communication of the ACM， 2017， 60（6）： 84–90. 10.1145/3065386
2	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149. 10.1109/tpami.2016.2577031
3	LIN T-Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
4	HE K， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
5	CAI Z， VASCONCELOS N. Cascade R-CNN： delving into high quality object detection［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6154-6162. 10.1109/cvpr.2018.00644
6	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
7	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot MultiBox detector［C］// Proceeding of the 14th European Conference on Computer Vision. Berlin： Springer， 2016： 21-37. 10.1007/978-3-319-46448-0_2
8	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
9	LIN T-Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.324
10	REDMON J， FARHADI A. YOLOV3： an incremental improvement ［EB/OL］. （2018-04-08）［2023-05-07］. . 10.1109/cvpr.2017.690
11	BOCHKOVSKIY A， WANG C-Y， LIAO H-Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-04-23）［2023-05-07］. .
12	GE Z， LIU S， WANG F， et al. YOLOX： exceeding YOLO series in 2021［EB/OL］. （2021-08-06）［2023-05-07］. .
13	XU S， WANG X， LV W， et al. PP-YOLOE： an evolved version of YOLO［EB/OL］. （2022-12-12）［2023-05-07］. .
14	刘辉，张琳玉，王复港，等.基于注意力机制和上下文信息的目标检测算法［J］.计算机应用，2023，43（5）：1557-1564.
	LIU H， ZHANG L Y， WANG F G， et al. Object detection algorithm based on attention mechanism and context information［J］. Journal of Computer Applications， 2023， 43（5）： 1557-1564.
15	李佳东，张丹普，范亚琼，等.基于改进YOLOv5的轻量级船舶目标检测算法［J］.计算机应用，2023，43（3）：923-929. 10.11772/j.issn.1001-9081.2022071096
	LI J D， ZHANG D P， FAN Y Q， et al. Lightweight ship target detection algorithm based on improved YOLOv5［J］. Journal of Computer Applications， 2023， 43（3）： 923-929. 10.11772/j.issn.1001-9081.2022071096
16	王怀济，李广明，张红良，等.融合卷积通道注意力的遥感图像目标检测方法［J/OL］.计算机工程与应用：1-14 ［2023-05-27］. . 10.3778/j.issn.1002-8331.2211-0037
	WANG H J， LI G M， ZHANG H L， et al. Rotating object detection method based on convolutional block channel attention in remote sensing images［J/OL］. Computer Engineering and Applications：1-14 ［2023-05-27］. . 10.3778/j.issn.1002-8331.2211-0037
17	盛博莹，侯进，李嘉新，等.面向复杂交通场景的道路目标检测方法［J］.计算机工程与应用，2023，59（15）：87-96. 10.3778/j.issn.1002-8331.2212-0093
	SHENG B Y， HOU J， LI J X， et al. Road object detection method for complex road scenes［J］. Computer Engineering and Applications，2023，59（15）：87-96. 10.3778/j.issn.1002-8331.2212-0093
18	WANG C-Y， LIAO H-Y M， WU Y-H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
19	HE K， ZHANG X， REN S， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916. 10.1109/tpami.2015.2389824
20	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
21	ZHANG H， WANG Y， DAYOUB F， et al. VarifocalNet： an IoU-aware dense object detector［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE，2021： 8514-8523. 10.1109/cvpr46437.2021.00841
22	HAN K， WANG Y， TIAN Q， et al. GhostNet： more features from cheap operations［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1580-1589. 10.1109/cvpr42600.2020.00165

名称	AP_0.5/%					mAP_0.5/%	参数量/10⁶	GFLOPs	FPS
名称	罐车	人	车牌	水管	水桶	mAP_0.5/%	参数量/10⁶	GFLOPs	FPS
Faster R-CNN^［2］	89.3	86.9	87.1	43.9	85.8	78.6	27.82	130.7	8
SSD^［7］	90.5	87.6	88.4	46.2	83.6	79.2	26.28	35.1	64
YOLOv4-tiny^［11］	92.3	88.4	87.9	45.7	87.4	80.3	6.06	6.9	137
YOLOXs^［12］	96.9	94.7	94.5	53.6	93.8	86.7	9.02	26.8	94
YOLOv5s	99.1	96.4	97.4	56.9	96.4	89.2	7.02	15.8	163
PP-YOLOEs^［13］	99.0	96.3	98.1	63.8	96.4	90.7	7.93	17.4	150
YOLO-C	99.2	96.7	98.7	65.4	97.6	91.5	4.16	16.9	159

名称	AP_0.5/%					mAP_0.5/%	参数量/10⁶	GFLOPs	FPS
名称	罐车	人	车牌	水管	水桶	mAP_0.5/%	参数量/10⁶	GFLOPs	FPS
Faster R-CNN^［2］	89.3	86.9	87.1	43.9	85.8	78.6	27.82	130.7	8
SSD^［7］	90.5	87.6	88.4	46.2	83.6	79.2	26.28	35.1	64
YOLOv4-tiny^［11］	92.3	88.4	87.9	45.7	87.4	80.3	6.06	6.9	137
YOLOXs^［12］	96.9	94.7	94.5	53.6	93.8	86.7	9.02	26.8	94
YOLOv5s	99.1	96.4	97.4	56.9	96.4	89.2	7.02	15.8	163
PP-YOLOEs^［13］	99.0	96.3	98.1	63.8	96.4	90.7	7.93	17.4	150
YOLO-C	99.2	96.7	98.7	65.4	97.6	91.5	4.16	16.9	159

编号	GhostConv	多尺度检测	CSA	VariFocal Loss	mAP_0.5/%	参数量/10⁶	GFLOPs	FPS
0					89.2	7.02	15.8	163
1	√				88.6	3.69	8.2	192
2		√			90.0	7.69	27.1	96
3			√		89.6	7.11	18.4	128
4				√	89.8	7.02	15.7	158
5	√	√			89.1	4.03	13.8	125
6	√	√	√		91.2	4.16	16.9	157
7	√	√	√	√	91.5	4.16	16.9	159

编号	GhostConv	多尺度检测	CSA	VariFocal Loss	mAP_0.5/%	参数量/10⁶	GFLOPs	FPS
0					89.2	7.02	15.8	163
1	√				88.6	3.69	8.2	192
2		√			90.0	7.69	27.1	96
3			√		89.6	7.11	18.4	128
4				√	89.8	7.02	15.7	158
5	√	√			89.1	4.03	13.8	125
6	√	√	√		91.2	4.16	16.9	157
7	√	√	√	√	91.5	4.16	16.9	159

[1]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[2]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[3]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[4]	Yanjun LI, Yaodong GE, Qi WANG, Weiguo ZHANG, Chen LIU. Improved KLEIN algorithm and its quantum analysis [J]. Journal of Computer Applications, 2024, 44(9): 2810-2817.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[8]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[9]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[10]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[11]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[12]	Yongjin ZHANG, Jian XU, Mingxing ZHANG. Lightweight algorithm for impurity detection in raw cotton based on improved YOLOv7 [J]. Journal of Computer Applications, 2024, 44(7): 2271-2278.
[13]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[14]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[15]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.

Real-time object detection algorithm for complex construction environments

面向复杂施工环境的实时目标检测算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 22

Related Articles 15

Recommended Articles

Metrics