面向复杂交通场景的多尺度实时人车检测方法CDC-DETR

doi:10.11772/j.issn.1001-9081.2025040472

《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (4): 1283-1291.DOI: 10.11772/j.issn.1001-9081.2025040472

• 多媒体计算与计算机仿真 • 上一篇

面向复杂交通场景的多尺度实时人车检测方法CDC-DETR

严心怡¹, 朱灵龙²^,³^,⁴, 张永宏¹^,²^,⁴()

^1.南京信息工程大学自动化学院，南京 210044
^2.无锡学院物联网工程学院，江苏无锡 214105
^3.公安部交通管理科学研究所，江苏无锡 214151
^4.无锡学院无锡市车联网重点实验室，江苏无锡 214105

收稿日期:2025-05-06 修回日期:2025-07-21 接受日期:2025-07-23 发布日期:2026-04-21 出版日期:2026-04-10
通讯作者: 张永宏
作者简介:严心怡（2002—），女，江苏盐城人，硕士研究生，主要研究方向：计算机视觉、深度学习
朱灵龙（1993—），男，江苏南通人，副教授，博士，主要研究方向：交通大数据
基金资助:
国家自然科学基金资助项目(42175157);国家自然科学基金资助项目(42305158);江苏省无锡市“太湖之光”科技攻关计划（基础研究）项目(K20231021)

CDC-DETR： multi-scale real-time human-vehicle detection method for complex traffic scenarios

Xinyi YAN¹, Linglong ZHU²^,³^,⁴, Yonghong ZHANG¹^,²^,⁴()

^1.School of Automation，Nanjing University of Information Science and Technology，Nanjing Jiangsu 210044，China
^2.School of Internet of Things Engineering，Wuxi University，Wuxi Jiangsu 214105，China
^3.Traffic Management Research Institute of the Ministry of Public Security，Wuxi Jiangsu 214151，China
^4.Wuxi Key Laboratory of Telematics，Wuxi University，Wuxi Jiangsu 214105，China

Received:2025-05-06 Revised:2025-07-21 Accepted:2025-07-23 Online:2026-04-21 Published:2026-04-10
Contact: Yonghong ZHANG
About author:YAN Xinyi， born in 2002， M. S. candidate. Her research interests include computer vision， deep learning.
ZHU Linglong， born in 1993， Ph. D.， associate professor. His research interests include big data for transportation.
Supported by:
National Natural Science Foundation of China(42175157);Program of “Light of Taihu Lake” Science and Technology （Basic Research） of Wuxi City, Jiangsu Province(K20231021)

摘要/Abstract

摘要：

交通场景的复杂性和多变性对现有的人车目标检测算法提出了挑战，尤其在处理遮挡、光照变化和多尺度目标时，现有算法通常精度不足且计算效率较低。为解决上述问题，在RT-DETR（Real-Time DEtection TRansformer）模型的基础上，提出一种改进型检测模型CDC-DETR（CPPA-DWRC-CGNET-DETR）。首先，设计上下文预激活池化注意力（CPPA）模块，以增强远距离依赖，优化特征提取；其次，引入膨胀残差连接（DWRC）模块，提升多尺度特征表达能力；再次，提出轻量化的上下文引导模块（CG Block），融合局部、周边和全局信息，降低计算成本；最后，结合上述模块，构建一个适用于复杂交通场景的高精度、高效率的实时人车检测模型。实验结果表明，与RT-DETR相比，在数据集BDD100K上，当交并比（IoU）为0.5时，CDC-DETR的检测平均精度均值（mAP）提高了6.12%，召回率提升了4.35%，而浮点运算量减少了11.23%，显著提高了计算效率，为边缘设备的部署提供了高效的解决方案。

关键词: 辅助驾驶, 人车检测, Transformer, 智能交通感知, 多尺度特征融合

Abstract:

The complexity and variability of traffic scenarios challenge existing human-vehicle target detection algorithms， especially when dealing with occlusion， illumination changes and multi-scale targets， existing algorithms tend to have insufficient accuracy and low computational efficiency. To solve the above problems， an improved detection model， CDC-DETR （CPPA-DWRC-CGNET-DETR）， was developed based on the RT-DETR （Real-Time DEtection TRansformer） architecture. Firstly， a Context Pre-activation Pooling Attention （CPPA） module was designed to enhance long-range dependencies and optimize feature extraction. Secondly， a Dilation-Wise Residual Connection （DWRC） module was introduced to improve multi-scale feature representation. Thirdly， a lightweight Context Guided Block （CG Block） was proposed to fuse local， surrounding， and global information and reduce computational cost. Finally， these modules were integrated to construct a high-accuracy and efficient real-time human-vehicle detection model suitable for complex traffic scenarios. Experimental results on the BDD100K dataset show that compared to RT-DETR， when the Intersection over Union （IoU） is 0.5， CDC-DETR improves the mean Average Precision （mAP） by 6.12%， increases the recall by 4.35%， and decrease the number of floating-point operations by 11.23%， enhancing computational efficiency significantly and providing an effective solution for deployment on edge devices.

Key words: assisted driving, human-vehicle detection, Transformer, intelligent traffic perception, multi-scale feature fusion

中图分类号:

TP391.41

严心怡, 朱灵龙, 张永宏. 面向复杂交通场景的多尺度实时人车检测方法CDC-DETR[J]. 计算机应用, 2026, 46(4): 1283-1291.

Xinyi YAN, Linglong ZHU, Yonghong ZHANG. CDC-DETR： multi-scale real-time human-vehicle detection method for complex traffic scenarios[J]. Journal of Computer Applications, 2026, 46(4): 1283-1291.

图/表 12

参考文献 31

[1]	国家统计局. 交通事故发生数（2023）［EB/OL］. ［2025-07-19］..
	National Bureau of Statistics of China. Numbers of traffic accidents （2023）［EB/OL］. ［2025-07-19］..
[2]	WANG Z， ZHAN J， DUAN C， et al. A review of vehicle detection techniques for intelligent vehicles［J］. IEEE Transactions on Neural Networks and Learning Systems， 2023， 34（8）： 3811-3831.
[3]	CHEN L， LIN S， LU X， et al. Deep neural network based vehicle and pedestrian detection for autonomous driving： a survey［J］. IEEE Transactions on Intelligent Transportation Systems， 2021， 22（6）： 3234-3246.
[4]	ZOU Z， CHEN K， SHI Z， et al. Object detection in 20 years： a survey［J］. Proceedings of the IEEE， 2023， 111（3）： 257-276.
[5]	ALMUKHALFI H， NOOR A， NOOR T H. Traffic management approaches using machine learning and deep learning techniques： a survey［J］. Engineering Applications of Artificial Intelligence， 2024， 133（Pt B）： No.108147.
[6]	WEI H， LIU X， XU S， et al. DWRSeg： rethinking efficient acquisition of multi-scale contextual information for real-time semantic segmentation［EB/OL］. ［2025-07-19］..
[7]	WU T， TANG S， ZHANG R， et al. CGNet： a light-weight context guided network for semantic segmentation［J］. IEEE Transactions on Image Processing， 2021， 30： 1169-1179.
[8]	AZIMJONOV J， ÖZMEN A. A real-time vehicle detection and a novel vehicle tracking systems for estimating and monitoring traffic flow on highways［J］. Advanced Engineering Informatics， 2021， 50： No.101393.
[9]	WEI Y， TIAN Q， GUO J， et al. Multi-vehicle detection algorithm through combining Haar and HOG features［J］. Mathematics and Computers in Simulation， 2019， 155： 130-145.
[10]	RAZALLI H， RAMLI R， ALKAWAZ M H. Emergency vehicle recognition and classification method using HSV color segmentation［C］// Proceedings of the 16th IEEE International Colloquium on Signal Processing and Its Applications. Piscataway： IEEE， 2020： 284-289.
[11]	THIKE L L， THEIN T L L. Vehicle detection using upper local ternary features with SVM classification［C］// Proceedings of the 2023 IEEE Conference on Computer Applications. Piscataway： IEEE， 2023： 282-287.
[12]	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587.
[13]	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448.
[14]	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
[15]	LUO J Q， FANG H S， SHAO F M， et al. Multi-scale traffic vehicle detection based on Faster R-CNN with NAS optimization and feature enrichment［J］. Defence Technology， 2021， 17（4）： 1542-1554.
[16]	GHOSH R. On-road vehicle detection in varying weather conditions using Faster R-CNN with several region proposal networks［J］. Multimedia Tools and Applications， 2021， 80（17）： 25985-25999.
[17]	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot MultiBox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
[18]	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788.
[19]	LU J， HUANG T， ZHANG Q， et al. A lightweight vehicle detection network fusing feature pyramid and channel attention［J］. Internet of Things， 2024， 26： No.101166.
[20]	CHEN Z， GUO H， YANG J， et al. Fast vehicle detection algorithm in traffic scene based on improved SSD［J］. Measurement， 2022， 201： No.111655.
[21]	REN J， YANG J， ZHANG W， et al. RBS-YOLO： a vehicle detection algorithm based on multi-scale feature extraction［J］. Signal， Image and Video Processing， 2024， 18（4）： 3421-3430.
[22]	LIU Y， HUANG Z， SONG Q， et al. PV-YOLO： a lightweight pedestrian and vehicle detection model based on improved YOLOv8［J］. Digital Signal Processing， 2025， 156（Pt B）： No.104857.
[23]	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
[24]	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with Transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.
[25]	ZHU X， SU W， LU L， et al. Deformable DETR： deformable Transformers for end-to-end object detection［EB/OL］. ［2025-07-19］..
[26]	ZHAO Y， LV W， XU S， et al. DETRs beat YOLOs on real-time object detection［C］// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2024： 16965-16974.
[27]	CAI X， LAI Q， WANG Y， et al. Poly kernel inception network for remote sensing detection［C］// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2024： 27706-27716.
[28]	YU F， CHEN H， WANG X， et al. BDD100K： a diverse driving dataset for heterogeneous multitask learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 2633-2642.
[29]	HE K， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（2）： 386-397.
[30]	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（2）： 318-327.
[31]	LYU C， ZHANG W， HUANG H， et al. RTMDet： an empirical study of designing real-time object detectors［EB/OL］. ［2025-07-19］..

类别	精度	召回率	mAP_0.5	mAP_0.5-0.95	F1分数	准确率
整体	0.693	0.552	0.590	0.365	0.615	0.537
行人	0.702	0.491	0.579	0.256	0.579	0.690
汽车	0.806	0.721	0.789	0.474	0.761	0.840
巴士	0.640	0.474	0.490	0.376	0.545	0.550
卡车	0.625	0.524	0.504	0.354	0.571	0.580

类别	精度	召回率	mAP_0.5	mAP_0.5-0.95	F1分数	准确率
整体	0.693	0.552	0.590	0.365	0.615	0.537
行人	0.702	0.491	0.579	0.256	0.579	0.690
汽车	0.806	0.721	0.789	0.474	0.761	0.840
巴士	0.640	0.474	0.490	0.376	0.545	0.550
卡车	0.625	0.524	0.504	0.354	0.571	0.580

模型	精度	召回率	mAP_0.5	mAP_0.5-0.95	F1分数	参数量/10⁶	浮点运算量/GFLOPs	准确率
SSD	0.469	0.233	0.265	0.125	0.311	2 628	62.7	0.200
Faster R-CNN	0.338	0.240	0.178	0.107	0.281	2 848	188.2	0.137
Mask R-CNN	0.585	0.504	0.548	0.316	0.542	4 143	90.9	0.408
RetinaNet	0.633	0.450	0.507	0.299	0.526	3 668	84.5	0.377
RTMDet	0.454	0.400	0.348	0.196	0.425	2 760	54.1	0.261
YOLO11	0.648	0.553	0.596	0.383	0.598	2 003	67.7	0.434
YOLOv10	0.699	0.525	0.590	0.378	0.600	1 645	63.4	0.424
YOLOv8	0.724	0.539	0.599	0.384	0.616	2 584	78.7	0.443
YOLOv5	0.676	0.535	0.584	0.369	0.598	2 504	64.0	0.435
YOLOv3	0.690	0.547	0.605	0.387	0.609	10 366	282.2	0.456
RT-DETR	0.682	0.529	0.556	0.338	0.597	1 987	57.0	0.481
CDC-DETR	0.693	0.552	0.590	0.365	0.615	2 364	50.6	0.537

模型	精度	召回率	mAP_0.5	mAP_0.5-0.95	F1分数	参数量/10⁶	浮点运算量/GFLOPs	准确率
SSD	0.469	0.233	0.265	0.125	0.311	2 628	62.7	0.200
Faster R-CNN	0.338	0.240	0.178	0.107	0.281	2 848	188.2	0.137
Mask R-CNN	0.585	0.504	0.548	0.316	0.542	4 143	90.9	0.408
RetinaNet	0.633	0.450	0.507	0.299	0.526	3 668	84.5	0.377
RTMDet	0.454	0.400	0.348	0.196	0.425	2 760	54.1	0.261
YOLO11	0.648	0.553	0.596	0.383	0.598	2 003	67.7	0.434
YOLOv10	0.699	0.525	0.590	0.378	0.600	1 645	63.4	0.424
YOLOv8	0.724	0.539	0.599	0.384	0.616	2 584	78.7	0.443
YOLOv5	0.676	0.535	0.584	0.369	0.598	2 504	64.0	0.435
YOLOv3	0.690	0.547	0.605	0.387	0.609	10 366	282.2	0.456
RT-DETR	0.682	0.529	0.556	0.338	0.597	1 987	57.0	0.481
CDC-DETR	0.693	0.552	0.590	0.365	0.615	2 364	50.6	0.537

序号	CPPA	DWRC	CG Block	参数量/10⁶	浮点运算量/GFLOPs	精度	召回率	mAP_0.5	mAP_0.5-0.95	准确率
1				1 987	57.0	0.682	0.529	0.556	0.338	0.481
2	√			2 041	57.4	0.715	0.533	0.571	0.348	0.492
3		√		2 453	60.7	0.669	0.534	0.570	0.351	0.483
4			√	1 652	47.6	0.669	0.534	0.560	0.343	0.511
5	√	√		2 507	61.1	0.702	0.532	0.571	0.351	0.508
6	√		√	1 460	43.3	0.666	0.509	0.537	0.318	0.490
7		√	√	2 555	54.9	0.679	0.552	0.586	0.358	0.528
8	√	√	√	2 364	50.6	0.693	0.552	0.590	0.365	0.537

面向复杂交通场景的多尺度实时人车检测方法CDC-DETR

CDC-DETR： multi-scale real-time human-vehicle detection method for complex traffic scenarios

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 31

相关文章 15

编辑推荐

Metrics

[1]	郭纪新, 张婷. 基于组件协同优化剪枝的Transformer图像去雾[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 933-939.
[2]	黄萍, 李清, 邱海枫, 王程斯, 黄安子, 樊龙. 轻量化输电线路缺陷检测方法[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 969-979.
[3]	刘汉卿, 桑国明, 张益嘉. 结合密集多尺度特征融合和特征知识增强Transformer的遥感图像描述模型[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 741-749.
[4]	张健, 于剑波, 汤健. 基于多层预处理的城市固废焚烧状态识别方法[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 940-949.
[5]	姚理进, 张迪, 周丕宇, 曲志坚, 王海鹏. 基于Transformer和门控循环单元的磷酸化肽从头测序算法[J]. 《计算机应用》唯一官方网站, 2026, 46(1): 297-304.
[6]	桑雨, 贡同, 赵琛, 于博文, 李思漫. 具有光度对齐的域适应夜间目标检测方法[J]. 《计算机应用》唯一官方网站, 2026, 46(1): 242-251.
[7]	王丽芳, 任文婧, 郭晓东, 张荣国, 胡立华. 用于低剂量CT图像降噪的多路特征生成对抗网络[J]. 《计算机应用》唯一官方网站, 2026, 46(1): 270-279.
[8]	吴俊衡, 王晓东, 何启学. 基于统计分布感知与频域双通道融合的时序预测模型[J]. 《计算机应用》唯一官方网站, 2026, 46(1): 113-123.
[9]	吕景刚, 彭绍睿, 高硕, 周金. 复频域注意力和多尺度频域增强驱动的语音增强网络[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2957-2965.
[10]	梁一鸣, 范菁, 柴汶泽. 基于双向交叉注意力的多尺度特征融合情感分类[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2773-2782.
[11]	李进, 刘立群. 基于残差Swin Transformer的SAR与可见光图像融合[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2949-2956.
[12]	王芳, 胡静, 张睿, 范文婷. 内容引导下多角度特征融合医学图像分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 3017-3025.
[13]	周金, 李玉芝, 张徐, 高硕, 张立, 盛家川. 复杂电磁环境下的调制识别网络[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2672-2682.
[14]	陶永鹏, 柏诗淇, 周正文. 基于卷积和Transformer神经网络架构搜索的脑胶质瘤多组织分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2378-2386.
[15]	陈亮, 王璇, 雷坤. 复杂场景下跨层多尺度特征融合的安全帽佩戴检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2333-2341.