Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (3): 700-705.DOI: 10.11772/j.issn.1001-9081.2018071587

Previous Articles     Next Articles

Multi-class vehicle detection in surveillance video based on deep learning

XU Zihao1,2, HUANG Weiquan1,2, WANG Yin1,2   

  1. 1. Department of Computer Science and Technology, Tongji University, Shanghai 201804, China;
    2. Key Laboratory of Embedded Systems and Service Computing(Tongji University), Shanghai 201804, China
  • Received:2018-08-01 Revised:2018-09-18 Online:2019-03-10 Published:2019-03-11
  • Contact: 徐子豪
  • Supported by:
    This work is partially supported by the Shanghai Science and Technology Commission Fund Project (17511104502).

基于深度学习的监控视频中多类别车辆检测

徐子豪1,2, 黄伟泉1,2, 王胤1,2   

  1. 1. 同济大学 计算机科学与技术系, 上海 201804;
    2. 嵌入式系统与服务计算教育部重点实验室(同济大学), 上海 201804
  • 作者简介:徐子豪(1995-),男,黑龙江哈尔滨人,硕士研究生,主要研究方向:深度学习、计算机视觉;黄伟泉(1994-),男,广东广州人,硕士研究生,主要研究方向:深度学习、计算机视觉;王胤(1979-),男,湖南长沙人,教授,博士生导师,博士,主要研究方向:深度学习、人工智能。
  • 基金资助:
    上海科委基金资助项目(17511104502)。

Abstract: Since performance of traditional machine learning methods of detecting vehicles in traffic surveillance video is influenced by objective factors such as video quality, shooting angle and weather, which results in complex preprocessing, hard generalization and poor robustness, combined with dilated convolution, feature pyramid and focal loss, two deep learning models which are improved Faster R-CNN (Faster Regions with Convolutional Neural Network) and SSD (Single Shot multibox Detector) model were proposed for vehicle detection. Firstly, a dataset was composed of 851 labeled images captured from the surveillance video at different time. Secondly, improved and original models were trained under same training strategies. Finally, average accuracy of each model were calculated to evaluate. Experimental results show that compared with original Faster R-CNN and SSD, the average accuracies of the improved models improve 0.8 percentage points and 1.7 percentage points respectively. Both deep learning methods are more suitable for vehicle detection in complicated situation than traditional methods. The former has higher accuracy and slower speed, which is more suitable for video off-line processing, while the latter has lower accuracy and higher speed, which is more suitable for video real-time detection.

Key words: deep learning, vehicle detection, dilated convolution, feature pyramid, focal loss

摘要: 针对传统机器学习算法在交通监控视频的车辆检测中易受视频质量、拍摄角度、天气环境等客观因素影响,预处理过程繁琐、难以进行泛化、鲁棒性差等问题,结合空洞卷积、特征金字塔、焦点损失,提出改进的更快的区域卷积神经网络(Faster R-CNN)和单阶段多边框检测检测器(SSD)两种深度学习模型进行多类别车辆检测。首先从监控视频中截取的不同时间的851张标注图构建数据集;然后在保证训练策略相同的情况下,对两种改进后的模型与原模型进行训练;最后对每个模型的平均准确率进行评估。实验结果表明,与原Faster R-CNN和SSD模型相比,改进后的Faster R-CNN和SSD模型的平均准确率分别提高了0.8个百分点和1.7个百分点,两种深度学习方法较传统方法更适应复杂情况下的车辆检测任务,前者准确度较高、速度较慢,更适用于视频离线处理,后者准确度较低、速度较快,更适用于视频实时检测。

关键词: 深度学习, 车辆检测, 空洞卷积, 特征金字塔, 焦点损失

CLC Number: