改进的卷积神经网络行人检测方法

doi:10.11772/j.issn.1001-9081.2017.06.1708

摘要/Abstract

摘要： 为了在行人检测任务中使卷积神经网络（CNN）选择出更优模型并获得定位更准确的检测框，提出一种改进的基于卷积神经网络的行人检测方法。改进主要涉及两个方面：如何决定CNN样本迭代学习次数和如何进行重合窗口的合并。首先，关于CNN样本迭代次序问题，在顺序迭代训练多个CNN分类模型的基础上，提出一种基于校验集正确率及其在迭代系列分类器中展现出的稳定性进行更优模型选择的策略，以使最终选择的分类器推广能力更优。其次，提出了一种不同于非极大值抑制（NMS）的多个精确定位回归框合并机制。精确定位回归框的获取以CNN检测过程输出的粗定位框作为输入。然后，对每个粗定位框应用CNN精确定位过程并获得对应的精确定位回归框。最后，对多个精确定位回归框进行合并，合并过程考虑了每个精确定位回归框的正确概率。更精确地说，最终的合并窗口是基于多个相关的精确定位回归框的概率加权求和方式获得。针对提出的两个改进，在国际上广泛使用的行人检测公共测试数据集ETH上进行了一系列实验。实验结果表明，所提的两个改进方法均能有效地提高系统的检测性能，在相同的测试条件下，融合两个改进的方法相比Fast R-CNN算法检测性能提升了5.06个百分点。

关键词: 深度学习, 卷积神经网络, 图像分类, 行人检测

Abstract: In order to choose better model and acquire more accurate bounding-box when using the Convolutional Neural Network (CNN) in pedestrian detection, an improved pedestrian detection method based on CNN was proposed. The improvements include two aspects:how to determine the iterative learning number of training CNN samples and how to merge multiple responses of an object. Firstly, on the solution of the first improvement, multiple candidate CNN classifiers were learned from different training samples in different training iterations. And a new strategy was proposed to select the model with better generalization ability. Both the accuracy on the validation set and the stability of the accuracies during the iterative training procedure were considered by the proposed strategy. On the improvement of combining multiple responses, an enhanced refined bounding-box combination method was proposed which was different from the Non-Maximum Suppression (NMS) method. The coarse bounding-box of CNN detection procedure output was taken as the input for obtaining the one-to-one refined bounding-box. Then, the CNN accurate positioning process was used for each coarse bounding-box to get the corresponding refined bounding-box. Finally, the multiple refined bounding-boxes were merged by considering the correction probability of each bounding-box. Exactly, the final output bounding-box was obtained by the weighted average of multiple relevant refined bounding boxes with respect to their correction probabilities. To investigate the proposed two improvements, the comprehensive experiments were conducted on well-recognized pedestrian detection benchmark dataset-ETH. The experimental results show that, the two proposed improvements have effectively improved the detection performance of the system. Compared with the benchmark method of Fast Region proposals with CNN (R-CNN), the detection performance of the proposed method with the fusion of two improvements has greatly improved by 5.06 percentage points under the same test conditions.

Key words: deep learning, Convolutional Neural Network (CNN), image classification, pedestrian detection

中图分类号:

TP391.41

徐超, 闫胜业. 改进的卷积神经网络行人检测方法[J]. 计算机应用, 2017, 37(6): 1708-1715.

XU Chao, YAN Shengye. Improved pedestrian detection method based on convolutional neural network[J]. Journal of Computer Applications, 2017, 37(6): 1708-1715.

参考文献

[1] CAI Z, SABERIAN M, VASCONCELOS N. Learning complexity-aware cascades for deep pedestrian detection[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2015:3361-3369.
[2] OREN M, PAPAGEORGIOU C, SINHA P, et al. Pedestrian detection using wavelet templates[C]//Proceedings of the 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 1997:193-199.
[3] HOSANG J, BENENSON R, DOLLAR P, et al. What makes for effective detection proposals[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(4):814-830.
[4] MAJI S, BERG A C, MALIK J. Classification using intersection kernel support vector machines is efficient[C]//CVPR 2008:Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2008:1-8.
[5] BREIMAN L. Random forests[J]. Machine learning, 2001, 45(1):5-32.
[6] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[C]//EuroCOLT'95:Proceedings of the 1995 Second European Conference on Computational Learning Theory, LNCS 904. Berlin:Springer, 1995:23-37.
[7] ZHANG S, BAUCKHAGE C, CREMERS A B. Informed Haar-like features improve pedestrian detection[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2014:947-954.
[8] WU B, NEVATIA R. Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors[C]//ICCV'05:Proceedings of the 2005 Tenth IEEE International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2005:90-97.
[9] SABZMEYDANI P, MORI G. Detecting pedestrians by learning shapelet features[C]//Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2007:1-8.
[10] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//CVPR'05:Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2005:886-893.
[11] LAMPERT C H, BLASCHKO M B, HOFMANN T. Beyond sliding windows:object localization by efficient subwindow search[C]//CVPR'08:Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2008:1-8.
[12] PORIKLI F. Integral histogram:a fast way to extract histograms in cartesian spaces[C]//CVPR'05:Proceedings of 2005 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2005:829-836.
[13] WALK S, MAJER N, SCHINDLAR K, el al. New features and insights for pedestrian detection[C]//Proceedings of 2010 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2010:1030-1037.
[14] TUZEL O, PORIKLI F, MEER P. Pedestrian detection via classification on riemannian manifolds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(10):1713-1727.
[15] DOLLAR P, TU Z, PERONA P, el al. Integral channel features[C]//Proceedings of the 2009 British Machine Vision Conference. Durham, UK:BMVA, 2009:91.1-91.11
[16] FELZENZWALB P F, GRISHICK R B, MCALLISTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9):1627-1645.
[17] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4):541-551.
[18] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//NIPS'12:Proceedings of the 25th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2012:1097-1105.
[19] RUSSAKOVSKY O, DENG J, SU H, el al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252.
[20] TIAN Y L, LUO P, WANG X G, et al. Pedestrian detection aided by deep learning semantic tasks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:5079-5087.
[21] TIAN Y L, LUO P, WANG X G, et al. Deep learning strong parts for pedestrian detection[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2015:1904-1912.
[22] GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(1):142-158.
[23] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2):303-338.
[24] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//Proceedings of the 2014 13th European Conference on Computer Vision, LNCS 8691. Berlin:Springer, 2014:346-361.
[25] GRISHICK R. Fast R-CNN[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:1440-1448.
[26] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171.
[27] ZITNICK C L, DOLLÁR P. Edge boxes:locating object proposals from edges[C]//ECCV 2014:Proceedings of 2014 13th European Conference on Computer Vision, LNCS 8693. Berlin:Springer, 2014:391-405.
[28] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2016-09-14]. http://www.philkr.net/CS395T/slides/w5_vgg.pdf.
[29] DOLLAR P, WOJEK C, SCHIELE B, et al. Pedestrian detection:an evaluation of the state of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4):743-761.
[30] 冯兴辉,张旭,陈礼贵,等.直线特征测量数据的平滑去噪[J].机械制造,2015,53(4):71-72.(FENG X H, ZHANG X, CHEN L G, et al. Smoothing denoising of linear feature measurement data[J]. Machinery, 2015, 53(4):71-72.)
[31] 刘远社.傅里叶变换与测不准原理[J].西南民族大学学报(自然科学版),2003,29(5):567-569.(LIU Y S. Fourier transform and uncertainty principle[J]. Journal of Southwest University for Nationalities (Natural Science Edition), 2003, 29(5):567-569.)
[32] WOJEK C, WALK S, SCHIELE B. Multi-cue onboard pedestrian detection[C]//CVPR 2009:Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2009:794-801.
[33] ESS A, LEIBE B, VAN GOOL L. Depth and appearance for mobile scene analysis[C]//Proceedings of the 2007 IEEE 11th International Conference on Computer Vision. Piscataway, NJ:IEEE, 2007:1-8.
[34] NAM W, DOLLÁR P, HAN J H. Local decorrelation for improved pedestrian detection[C]//Proceedings of the 2014 International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2014:424-432.
[35] BENENSON R, MATHIAS M, TUYTELAARS T, et al. Seeking the strongest rigid detector[C]//CVPR'13:Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2013:3666-3673.
[36] LUO P, TIAN Y L, WANG X G, et al. Switchable deep network for pedestrian detection[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2014:899-906.
[37] PAISITKRIANGKRAI S, SHEN C H, VAN DEN HENGEL A. Strengthening the effectiveness of pedestrian detection with spatially pooled features[C]//Proceedings of the 2014 European Conference on Computer Vision, LNCS 8692. Berlin:Springer, 2014:546-561.