计算机应用 ›› 2017, Vol. 37 ›› Issue (6): 1708-1715.DOI: 10.11772/j.issn.1001-9081.2017.06.1708

• 人工智能 • 上一篇    下一篇

改进的卷积神经网络行人检测方法

徐超1, 闫胜业1,2   

  1. 1. 江苏省大数据分析技术重点实验室(南京信息工程大学), 南京 210044;
    2. 大气环境与装备技术协同创新中心, 南京 210044
  • 收稿日期:2016-10-14 修回日期:2017-01-13 出版日期:2017-06-10 发布日期:2017-06-14
  • 通讯作者: 闫胜业
  • 作者简介:徐超(1991-),男,江苏盐城人,硕士研究生,主要研究方向:行人检测、卷积神经网络、物体检测;闫胜业(1978-),男,河南新乡人,教授,博士,主要研究方向:物体检测与识别、物体跟踪、特征点定位。
  • 基金资助:
    国家自然科学基金资助项目(61300163)。

Improved pedestrian detection method based on convolutional neural network

XU Chao1, YAN Shengye1,2   

  1. 1. Jiangsu Key Laboratory of Big Data Analysis Technology(Nanjing University of Information Science & Technology), Nanjing Jiangsu 210044, China;
    2. Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing Jiangsu 210044, China
  • Received:2016-10-14 Revised:2017-01-13 Online:2017-06-10 Published:2017-06-14
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61300163).

摘要: 为了在行人检测任务中使卷积神经网络(CNN)选择出更优模型并获得定位更准确的检测框,提出一种改进的基于卷积神经网络的行人检测方法。改进主要涉及两个方面:如何决定CNN样本迭代学习次数和如何进行重合窗口的合并。首先,关于CNN样本迭代次序问题,在顺序迭代训练多个CNN分类模型的基础上,提出一种基于校验集正确率及其在迭代系列分类器中展现出的稳定性进行更优模型选择的策略,以使最终选择的分类器推广能力更优。其次,提出了一种不同于非极大值抑制(NMS)的多个精确定位回归框合并机制。精确定位回归框的获取以CNN检测过程输出的粗定位框作为输入。然后,对每个粗定位框应用CNN精确定位过程并获得对应的精确定位回归框。最后,对多个精确定位回归框进行合并,合并过程考虑了每个精确定位回归框的正确概率。更精确地说,最终的合并窗口是基于多个相关的精确定位回归框的概率加权求和方式获得。针对提出的两个改进,在国际上广泛使用的行人检测公共测试数据集ETH上进行了一系列实验。实验结果表明,所提的两个改进方法均能有效地提高系统的检测性能,在相同的测试条件下,融合两个改进的方法相比Fast R-CNN算法检测性能提升了5.06个百分点。

关键词: 深度学习, 卷积神经网络, 图像分类, 行人检测

Abstract: In order to choose better model and acquire more accurate bounding-box when using the Convolutional Neural Network (CNN) in pedestrian detection, an improved pedestrian detection method based on CNN was proposed. The improvements include two aspects:how to determine the iterative learning number of training CNN samples and how to merge multiple responses of an object. Firstly, on the solution of the first improvement, multiple candidate CNN classifiers were learned from different training samples in different training iterations. And a new strategy was proposed to select the model with better generalization ability. Both the accuracy on the validation set and the stability of the accuracies during the iterative training procedure were considered by the proposed strategy. On the improvement of combining multiple responses, an enhanced refined bounding-box combination method was proposed which was different from the Non-Maximum Suppression (NMS) method. The coarse bounding-box of CNN detection procedure output was taken as the input for obtaining the one-to-one refined bounding-box. Then, the CNN accurate positioning process was used for each coarse bounding-box to get the corresponding refined bounding-box. Finally, the multiple refined bounding-boxes were merged by considering the correction probability of each bounding-box. Exactly, the final output bounding-box was obtained by the weighted average of multiple relevant refined bounding boxes with respect to their correction probabilities. To investigate the proposed two improvements, the comprehensive experiments were conducted on well-recognized pedestrian detection benchmark dataset-ETH. The experimental results show that, the two proposed improvements have effectively improved the detection performance of the system. Compared with the benchmark method of Fast Region proposals with CNN (R-CNN), the detection performance of the proposed method with the fusion of two improvements has greatly improved by 5.06 percentage points under the same test conditions.

Key words: deep learning, Convolutional Neural Network (CNN), image classification, pedestrian detection

中图分类号: