• •    

一种改进的卷积神经网络(CNN)行人检测方法

徐超,闫胜业   

  1. 南京信息工程大学
  • 收稿日期:2016-10-14 修回日期:2016-11-28 发布日期:2016-11-28
  • 通讯作者: 徐超

An improved pedestrian detection method based on convolutional neural network

  • Received:2016-10-14 Revised:2016-11-28 Online:2016-11-28
  • Contact: dachao

摘要: 为了在行人检测任务中使卷积神经网络(CNN)选择出更优模型并获得定位更准确检测框,提出一种基于卷积神经网络改进的行人检测方法。改进主要涉及两个方面,包括如何决定CNN样本迭代学习次数和如何进行重合窗口的合并。第一,关于CNN样本迭代次序问题,在顺序迭代训练多个CNN分类模型的基础上,提出一种基于校验集正确率及其在迭代系列分类器中展现出的稳定性并进行更优模型选择的策略,以使最终选择的分类器推广能力更优。第二,提出了一种不同于非极大值抑制的多个精确定位回归框合并机制。精确定位回归框的获取以CNN检测过程输出的粗定位框作为输入,然后对每个粗定位框应用CNN精确定位过程并获得对应的精确定位回归框,最后对多个精确定位回归框进行合并,合并过程考虑了每个精确定位回归框的正确概率。更精确来说,最终的合并窗口基于多个相关的精确定位回归框的概率加权求和方式获得。针对提出的两个改进,在国际上广泛使用的行人检测公共测试数据集ETH上进行了一系列实验。实验结果表明,提出的两个改进方法均能有效地提高系统的检测性能,在相同的测试条件下,融合两个改进的方法相比Fast R-CNN算法检测性能提升了5.06个百分点。

关键词: 深度学习, 卷积神经网络, 图像分类, 行人检测

Abstract: In order to make the convolutional neural network in pedestrian detection to choose better model and acquire more accurate bounding-boxes, an improved pedestrian detection method based on convolutional neural network is proposed. The improvements include two aspects: how to find a better iterative number of the training samples and how to merge multiple responses of an object. On the solution of the first improvement, multiple candidate CNN classifiers are learned from different training samples in different training iterations.. And a new strategy is proposed to select the model with better generalization ability. The strategy considers both the accuracy on the validation set and the stability of the accuracies during the iterative training procedure. On the improvement of combining multiple responses, an enhanced refined bounding-box combination method is proposed which is different from the non-maximum suppression method. The method takes the coarse bounding-boxes obtained by the CNN detection procedure as input, then gets the one-to-one refined bounding-box obtained by a CNN refinement procedure. And finally merges multiple refined bounding-boxes considering the bounding-box correction probability. Exactly, the final output bounding-box is the weighted average of multiple relevant refined bounding boxes with respect to their correction probabilities. To investigate the proposed two techniques comprehensive experiments are conducted on well-recognized pedestrian detection benchmark dataset (ETH). The experimental results show that the two proposed techniques are both effective under the same evaluation protocol. Comparing with the benchmark method of Fast R-CNN, the average miss rate of the proposed method is greatly reduced with a margin of 5.06 percentage points.

Key words: deep learning, convolutional neural network, image classification, pedestrian detection

中图分类号: