CDC-DETR： multi-scale real-time human-vehicle detection method for complex traffic scenarios

doi:10.11772/j.issn.1001-9081.2025040472

Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (4): 1283-1291.DOI: 10.11772/j.issn.1001-9081.2025040472

• Multimedia computing and computer simulation • Previous Articles

CDC-DETR： multi-scale real-time human-vehicle detection method for complex traffic scenarios

Xinyi YAN¹, Linglong ZHU²^,³^,⁴, Yonghong ZHANG¹^,²^,⁴()

^1.School of Automation，Nanjing University of Information Science and Technology，Nanjing Jiangsu 210044，China
^2.School of Internet of Things Engineering，Wuxi University，Wuxi Jiangsu 214105，China
^3.Traffic Management Research Institute of the Ministry of Public Security，Wuxi Jiangsu 214151，China
^4.Wuxi Key Laboratory of Telematics，Wuxi University，Wuxi Jiangsu 214105，China

Received:2025-05-06 Revised:2025-07-21 Accepted:2025-07-23 Online:2026-04-21 Published:2026-04-10
Contact: Yonghong ZHANG
About author:YAN Xinyi， born in 2002， M. S. candidate. Her research interests include computer vision， deep learning.
ZHU Linglong， born in 1993， Ph. D.， associate professor. His research interests include big data for transportation.
Supported by:
National Natural Science Foundation of China(42175157);Program of “Light of Taihu Lake” Science and Technology （Basic Research） of Wuxi City, Jiangsu Province(K20231021)

面向复杂交通场景的多尺度实时人车检测方法CDC-DETR

严心怡¹, 朱灵龙²^,³^,⁴, 张永宏¹^,²^,⁴()

^1.南京信息工程大学自动化学院，南京 210044
^2.无锡学院物联网工程学院，江苏无锡 214105
^3.公安部交通管理科学研究所，江苏无锡 214151
^4.无锡学院无锡市车联网重点实验室，江苏无锡 214105

通讯作者: 张永宏
作者简介:严心怡（2002—），女，江苏盐城人，硕士研究生，主要研究方向：计算机视觉、深度学习
朱灵龙（1993—），男，江苏南通人，副教授，博士，主要研究方向：交通大数据
基金资助:
国家自然科学基金资助项目(42175157);国家自然科学基金资助项目(42305158);江苏省无锡市“太湖之光”科技攻关计划（基础研究）项目(K20231021)

Abstract

Abstract:

The complexity and variability of traffic scenarios challenge existing human-vehicle target detection algorithms， especially when dealing with occlusion， illumination changes and multi-scale targets， existing algorithms tend to have insufficient accuracy and low computational efficiency. To solve the above problems， an improved detection model， CDC-DETR （CPPA-DWRC-CGNET-DETR）， was developed based on the RT-DETR （Real-Time DEtection TRansformer） architecture. Firstly， a Context Pre-activation Pooling Attention （CPPA） module was designed to enhance long-range dependencies and optimize feature extraction. Secondly， a Dilation-Wise Residual Connection （DWRC） module was introduced to improve multi-scale feature representation. Thirdly， a lightweight Context Guided Block （CG Block） was proposed to fuse local， surrounding， and global information and reduce computational cost. Finally， these modules were integrated to construct a high-accuracy and efficient real-time human-vehicle detection model suitable for complex traffic scenarios. Experimental results on the BDD100K dataset show that compared to RT-DETR， when the Intersection over Union （IoU） is 0.5， CDC-DETR improves the mean Average Precision （mAP） by 6.12%， increases the recall by 4.35%， and decrease the number of floating-point operations by 11.23%， enhancing computational efficiency significantly and providing an effective solution for deployment on edge devices.

Key words: assisted driving, human-vehicle detection, Transformer, intelligent traffic perception, multi-scale feature fusion

摘要：

交通场景的复杂性和多变性对现有的人车目标检测算法提出了挑战，尤其在处理遮挡、光照变化和多尺度目标时，现有算法通常精度不足且计算效率较低。为解决上述问题，在RT-DETR（Real-Time DEtection TRansformer）模型的基础上，提出一种改进型检测模型CDC-DETR（CPPA-DWRC-CGNET-DETR）。首先，设计上下文预激活池化注意力（CPPA）模块，以增强远距离依赖，优化特征提取；其次，引入膨胀残差连接（DWRC）模块，提升多尺度特征表达能力；再次，提出轻量化的上下文引导模块（CG Block），融合局部、周边和全局信息，降低计算成本；最后，结合上述模块，构建一个适用于复杂交通场景的高精度、高效率的实时人车检测模型。实验结果表明，与RT-DETR相比，在数据集BDD100K上，当交并比（IoU）为0.5时，CDC-DETR的检测平均精度均值（mAP）提高了6.12%，召回率提升了4.35%，而浮点运算量减少了11.23%，显著提高了计算效率，为边缘设备的部署提供了高效的解决方案。

关键词: 辅助驾驶, 人车检测, Transformer, 智能交通感知, 多尺度特征融合

CLC Number:

TP391.41

Xinyi YAN, Linglong ZHU, Yonghong ZHANG. CDC-DETR： multi-scale real-time human-vehicle detection method for complex traffic scenarios[J]. Journal of Computer Applications, 2026, 46(4): 1283-1291.

严心怡, 朱灵龙, 张永宏. 面向复杂交通场景的多尺度实时人车检测方法CDC-DETR[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1283-1291.

Figures/Tables 12

References 31

[1]	国家统计局. 交通事故发生数（2023）［EB/OL］. ［2025-07-19］..
	National Bureau of Statistics of China. Numbers of traffic accidents （2023）［EB/OL］. ［2025-07-19］..
[2]	WANG Z， ZHAN J， DUAN C， et al. A review of vehicle detection techniques for intelligent vehicles［J］. IEEE Transactions on Neural Networks and Learning Systems， 2023， 34（8）： 3811-3831.
[3]	CHEN L， LIN S， LU X， et al. Deep neural network based vehicle and pedestrian detection for autonomous driving： a survey［J］. IEEE Transactions on Intelligent Transportation Systems， 2021， 22（6）： 3234-3246.
[4]	ZOU Z， CHEN K， SHI Z， et al. Object detection in 20 years： a survey［J］. Proceedings of the IEEE， 2023， 111（3）： 257-276.
[5]	ALMUKHALFI H， NOOR A， NOOR T H. Traffic management approaches using machine learning and deep learning techniques： a survey［J］. Engineering Applications of Artificial Intelligence， 2024， 133（Pt B）： No.108147.
[6]	WEI H， LIU X， XU S， et al. DWRSeg： rethinking efficient acquisition of multi-scale contextual information for real-time semantic segmentation［EB/OL］. ［2025-07-19］..
[7]	WU T， TANG S， ZHANG R， et al. CGNet： a light-weight context guided network for semantic segmentation［J］. IEEE Transactions on Image Processing， 2021， 30： 1169-1179.
[8]	AZIMJONOV J， ÖZMEN A. A real-time vehicle detection and a novel vehicle tracking systems for estimating and monitoring traffic flow on highways［J］. Advanced Engineering Informatics， 2021， 50： No.101393.
[9]	WEI Y， TIAN Q， GUO J， et al. Multi-vehicle detection algorithm through combining Haar and HOG features［J］. Mathematics and Computers in Simulation， 2019， 155： 130-145.
[10]	RAZALLI H， RAMLI R， ALKAWAZ M H. Emergency vehicle recognition and classification method using HSV color segmentation［C］// Proceedings of the 16th IEEE International Colloquium on Signal Processing and Its Applications. Piscataway： IEEE， 2020： 284-289.
[11]	THIKE L L， THEIN T L L. Vehicle detection using upper local ternary features with SVM classification［C］// Proceedings of the 2023 IEEE Conference on Computer Applications. Piscataway： IEEE， 2023： 282-287.
[12]	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587.
[13]	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448.
[14]	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
[15]	LUO J Q， FANG H S， SHAO F M， et al. Multi-scale traffic vehicle detection based on Faster R-CNN with NAS optimization and feature enrichment［J］. Defence Technology， 2021， 17（4）： 1542-1554.
[16]	GHOSH R. On-road vehicle detection in varying weather conditions using Faster R-CNN with several region proposal networks［J］. Multimedia Tools and Applications， 2021， 80（17）： 25985-25999.
[17]	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot MultiBox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
[18]	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788.
[19]	LU J， HUANG T， ZHANG Q， et al. A lightweight vehicle detection network fusing feature pyramid and channel attention［J］. Internet of Things， 2024， 26： No.101166.
[20]	CHEN Z， GUO H， YANG J， et al. Fast vehicle detection algorithm in traffic scene based on improved SSD［J］. Measurement， 2022， 201： No.111655.
[21]	REN J， YANG J， ZHANG W， et al. RBS-YOLO： a vehicle detection algorithm based on multi-scale feature extraction［J］. Signal， Image and Video Processing， 2024， 18（4）： 3421-3430.
[22]	LIU Y， HUANG Z， SONG Q， et al. PV-YOLO： a lightweight pedestrian and vehicle detection model based on improved YOLOv8［J］. Digital Signal Processing， 2025， 156（Pt B）： No.104857.
[23]	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
[24]	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with Transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.
[25]	ZHU X， SU W， LU L， et al. Deformable DETR： deformable Transformers for end-to-end object detection［EB/OL］. ［2025-07-19］..
[26]	ZHAO Y， LV W， XU S， et al. DETRs beat YOLOs on real-time object detection［C］// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2024： 16965-16974.
[27]	CAI X， LAI Q， WANG Y， et al. Poly kernel inception network for remote sensing detection［C］// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2024： 27706-27716.
[28]	YU F， CHEN H， WANG X， et al. BDD100K： a diverse driving dataset for heterogeneous multitask learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 2633-2642.
[29]	HE K， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（2）： 386-397.
[30]	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（2）： 318-327.
[31]	LYU C， ZHANG W， HUANG H， et al. RTMDet： an empirical study of designing real-time object detectors［EB/OL］. ［2025-07-19］..

类别	精度	召回率	mAP_0.5	mAP_0.5-0.95	F1分数	准确率
整体	0.693	0.552	0.590	0.365	0.615	0.537
行人	0.702	0.491	0.579	0.256	0.579	0.690
汽车	0.806	0.721	0.789	0.474	0.761	0.840
巴士	0.640	0.474	0.490	0.376	0.545	0.550
卡车	0.625	0.524	0.504	0.354	0.571	0.580

类别	精度	召回率	mAP_0.5	mAP_0.5-0.95	F1分数	准确率
整体	0.693	0.552	0.590	0.365	0.615	0.537
行人	0.702	0.491	0.579	0.256	0.579	0.690
汽车	0.806	0.721	0.789	0.474	0.761	0.840
巴士	0.640	0.474	0.490	0.376	0.545	0.550
卡车	0.625	0.524	0.504	0.354	0.571	0.580

模型	精度	召回率	mAP_0.5	mAP_0.5-0.95	F1分数	参数量/10⁶	浮点运算量/GFLOPs	准确率
SSD	0.469	0.233	0.265	0.125	0.311	2 628	62.7	0.200
Faster R-CNN	0.338	0.240	0.178	0.107	0.281	2 848	188.2	0.137
Mask R-CNN	0.585	0.504	0.548	0.316	0.542	4 143	90.9	0.408
RetinaNet	0.633	0.450	0.507	0.299	0.526	3 668	84.5	0.377
RTMDet	0.454	0.400	0.348	0.196	0.425	2 760	54.1	0.261
YOLO11	0.648	0.553	0.596	0.383	0.598	2 003	67.7	0.434
YOLOv10	0.699	0.525	0.590	0.378	0.600	1 645	63.4	0.424
YOLOv8	0.724	0.539	0.599	0.384	0.616	2 584	78.7	0.443
YOLOv5	0.676	0.535	0.584	0.369	0.598	2 504	64.0	0.435
YOLOv3	0.690	0.547	0.605	0.387	0.609	10 366	282.2	0.456
RT-DETR	0.682	0.529	0.556	0.338	0.597	1 987	57.0	0.481
CDC-DETR	0.693	0.552	0.590	0.365	0.615	2 364	50.6	0.537

模型	精度	召回率	mAP_0.5	mAP_0.5-0.95	F1分数	参数量/10⁶	浮点运算量/GFLOPs	准确率
SSD	0.469	0.233	0.265	0.125	0.311	2 628	62.7	0.200
Faster R-CNN	0.338	0.240	0.178	0.107	0.281	2 848	188.2	0.137
Mask R-CNN	0.585	0.504	0.548	0.316	0.542	4 143	90.9	0.408
RetinaNet	0.633	0.450	0.507	0.299	0.526	3 668	84.5	0.377
RTMDet	0.454	0.400	0.348	0.196	0.425	2 760	54.1	0.261
YOLO11	0.648	0.553	0.596	0.383	0.598	2 003	67.7	0.434
YOLOv10	0.699	0.525	0.590	0.378	0.600	1 645	63.4	0.424
YOLOv8	0.724	0.539	0.599	0.384	0.616	2 584	78.7	0.443
YOLOv5	0.676	0.535	0.584	0.369	0.598	2 504	64.0	0.435
YOLOv3	0.690	0.547	0.605	0.387	0.609	10 366	282.2	0.456
RT-DETR	0.682	0.529	0.556	0.338	0.597	1 987	57.0	0.481
CDC-DETR	0.693	0.552	0.590	0.365	0.615	2 364	50.6	0.537

序号	CPPA	DWRC	CG Block	参数量/10⁶	浮点运算量/GFLOPs	精度	召回率	mAP_0.5	mAP_0.5-0.95	准确率
1				1 987	57.0	0.682	0.529	0.556	0.338	0.481
2	√			2 041	57.4	0.715	0.533	0.571	0.348	0.492
3		√		2 453	60.7	0.669	0.534	0.570	0.351	0.483
4			√	1 652	47.6	0.669	0.534	0.560	0.343	0.511
5	√	√		2 507	61.1	0.702	0.532	0.571	0.351	0.508
6	√		√	1 460	43.3	0.666	0.509	0.537	0.318	0.490
7		√	√	2 555	54.9	0.679	0.552	0.586	0.358	0.528
8	√	√	√	2 364	50.6	0.693	0.552	0.590	0.365	0.537

CDC-DETR： multi-scale real-time human-vehicle detection method for complex traffic scenarios

面向复杂交通场景的多尺度实时人车检测方法CDC-DETR

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 31

Related Articles 15

Recommended Articles

Metrics

[1]	Jie HU, Pengcheng LI, Jun SUN, Jiaao ZHANG. Key phrase extraction model based on multi-perspective information enhancement and hierarchical weighting [J]. Journal of Computer Applications, 2026, 46(4): 1050-1057.
[2]	Jian ZHANG, Jianbo YU, Jian TANG. Municipal solid waste incineration state recognition method based on multilayer preprocessing [J]. Journal of Computer Applications, 2026, 46(3): 940-949.
[3]	Jixin GUO, Ting ZHANG. Transformer image dehazing based on component collaborative optimization pruning [J]. Journal of Computer Applications, 2026, 46(3): 933-939.
[4]	Ping HUANG, Qing LI, Haifeng QIU, Chengsi WANG, Anzi HUANG, Long FAN. Lightweight method for transmission line defect detection [J]. Journal of Computer Applications, 2026, 46(3): 969-979.
[5]	Hanqing LIU, Guoming SANG, Yijia ZHANG. Remote sensing image captioning model combining dense multi-scale feature fusion and feature knowledge-enhanced Transformer [J]. Journal of Computer Applications, 2026, 46(3): 741-749.
[6]	Jun WU, Chuan ZHAO. Small object detection method based on improved DETR algorithm [J]. Journal of Computer Applications, 2026, 46(2): 564-571.
[7]	Lijin YAO, Di ZHANG, Piyu ZHOU, Zhijian QU, Haipeng WANG. Transformer and gated recurrent unit-based de novo sequencing algorithm for phosphopeptides [J]. Journal of Computer Applications, 2026, 46(1): 297-304.
[8]	Yu SANG, Tong GONG, Chen ZHAO, Bowen YU, Siman LI. Domain-adaptive nighttime object detection method with photometric alignment [J]. Journal of Computer Applications, 2026, 46(1): 242-251.
[9]	Lifang WANG, Wenjing REN, Xiaodong GUO, Rongguo ZHANG, Lihua HU. Trident generative adversarial network for low-dose CT image denoising [J]. Journal of Computer Applications, 2026, 46(1): 270-279.
[10]	Junheng WU, Xiaodong WANG, Qixue HE. Time series prediction model based on statistical distribution sensing and frequency domain dual-channel fusion [J]. Journal of Computer Applications, 2026, 46(1): 113-123.
[11]	Yiming LIANG, Jing FAN, Wenze CHAI. Multi-scale feature fusion sentiment classification based on bidirectional cross attention [J]. Journal of Computer Applications, 2025, 45(9): 2773-2782.
[12]	Jin LI, Liqun LIU. SAR and visible image fusion based on residual Swin Transformer [J]. Journal of Computer Applications, 2025, 45(9): 2949-2956.
[13]	Jinggang LYU, Shaorui PENG, Shuo GAO, Jin ZHOU. Speech enhancement network driven by complex frequency attention and multi-scale frequency enhancement [J]. Journal of Computer Applications, 2025, 45(9): 2957-2965.
[14]	Fang WANG, Jing HU, Rui ZHANG, Wenting FAN. Medical image segmentation network with content-guided multi-angle feature fusion [J]. Journal of Computer Applications, 2025, 45(9): 3017-3025.
[15]	Li LI, Han SONG, Peihe LIU, Hanlin CHEN. Named entity recognition for sensitive information based on data augmentation and residual networks [J]. Journal of Computer Applications, 2025, 45(9): 2790-2797.