Abstract:In order to solve the problem of uneven distribution of crowd and massive network learning parameters, a method for accurate high-density crowd counting was proposed, which is composed of Pixel-level Attention Mechanism (PAM) and improved single-column crowd density estimation network. First of all, the PAM was used to generate a high-quality local crowd density map by classifying the crowd images at pixel level, and the Full Convolutional Network (FCN) was used to generate the density mask of each image, and the pixels in image were divided into different density levels. Then, using the generated density mask as the label, the single-column crowd density estimation network was used to learn more representative features with fewer parameters. Before this method was proposed, the counting error of Network for Congested Scene Recognition (CSRNet) method was the smallest on part_B of Shanghaitech dataset, the UCF_CC_50 dataset and the WorldExpo'10 dataset. Comparing the error results of proposed method with CSRNet, it is found that this method has the Mean Absolute Error (MAE) and Mean Squared Error (MSE) on part_B of Shanghaitech dataset reduced by 8.49% and 4.37%; the MAE and MSE on UCF_CC_50 dataset decreased by 58.38% and 51.98% respectively, which are of significant optimization, and the MAE of overall average value on the WorldExpo'10 dataset reduced by 1.16%. The experimental results show that when counting the unevenly distributed high-density crowd, the method of combination of PAM and single-column crowd density estimation network can effectively improve the accuracy and training efficiency of high-density crowd counting.
[1] SINDAGI V A, PATEL V M. A survey of recent advances in CNN-based single image crowd counting and density estimation[J]. Pattern Recognition Letters, 2017, 107:3-16. [2] FU H, MA H, XIAO H. Scene-adaptive accurate and fast vertical crowd counting via joint using depth and color information[J]. Multimedia Tools and Applications, 2014, 73(1):273-289. [3] LI W, MAHADEVAN V, VASCONCELOS N. Anomaly detection and localization in crowded scenes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(1):18-32. [4] ZHOU B, TANG X, WANG X. Learning collective crowd behaviors with dynamic pedestrian-agents[J]. International Journal of Computer Vision, 2015, 111(1):50-68. [5] CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection:a survey[J]. ACM Computing Surveys, 2009, 41(3):No.15. [6] KE Y, SUKTHANKAR R, HEBERT M. Event detection in crowded videos[C]//Proceedings of the IEEE 11th International Conference on Computer Vision. Piscataway:IEEE, 2007:1-8. [7] 魏武,张起森,王明俊,等. 基于计算机视觉和图像处理的交通参数检测[J]. 信息与控制, 2001, 30(3):257-261. (WEI W, ZHANG Q S, WANG M J, et al. Detection of traffic parameters based on computer vision and image processing[J]. Information and Control, 2001, 30(3):257-261.) [8] 张桂铭,朱阿兴,杨胜天,等. 基于核密度估计的动物生境适宜度制图方法[J]. 生态学报, 2013, 33(23):7590-7600. (ZHANG G M, ZHU A X, YANG S T, et al. Mapping wildlife habitat suitability using nuclear density estimation[J]. Acta Ecologica Sinica, 2013, 33(23):7590-7600.) [9] FRENCH G, FISHER M H, MACKIEWICZ M, et al. Convolutional neural networks for counting fish in fisheries surveillance video[C]//Proceedings of the 26th British Machine Vision Conference. Durham:BMVA Press, 2015:23-32. [10] HAAR A. Zur Theorie der orthogonalen Funktionensysteme[J]. Mathematische Annalen, 1910, 69(3):331-371. [11] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2005:886-893. [12] OJALA T, PIETIKÄINEN M, MÄENPÄÄ T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7):971-987. [13] SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting[C]//Proceedings of the 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance. Piscataway:IEEE, 2017:1-6. [14] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90. [15] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:770-778. [16] ZHANG C, LI H, WANG X, et al. Cross-scene crowd counting via deep convolutional neural networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:833-841. [17] ZHANG L, SHI M, CHEN Q. Crowd counting via scale-adaptive convolutional neural network[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway:IEEE, 2018:1113-1121. [18] SINDAGI V A, PATEL V M. Generating high-quality crowd density maps using contextual pyramid CNNs[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:1879-1888. [19] ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:589-597. [20] ZEILER, M D, FERGUS R. Visualizing and understanding convolutional neural networks[C]//Proceedings of the 2014 European Conference on Computer Vision, LNCS 8689. Cham:Springer, 2014:818-833. [21] KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2014:1725-1732. [22] SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2017:4031-4039. [23] ZENG L, XU X, CAI B, et al. Multi-scale convolutional neural networks for crowd counting[C]//Proceedings of the 2017 IEEE International Conference on Image Processing. Piscataway:IEEE, 2017:465-469. [24] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015, 3431-3440. [25] 刘明林. 基于深度学习的人群密度估计及稠密人群计数的研究[D].郑州:郑州大学, 2017:1-55. (LIU M L. Research on crowd density estimation and dense population count based on deep learning[D]. Zhengzhou:Zhengzhou University, 2017:1-55.) [26] WALACH E, WOLF L. Learning to count with CNN boosting[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9906. Cham:Springer, 2016:660-676. [27] MARSDEN M, MACGUINNESS K, LITTLE S, et al. Fully convolutional crowd counting on highly congested scenes[C]//Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Setúbal:SciTePress, 2017:27-33. [28] LI Y, ZHANG X, CHEN D. CSRNet:dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:1091-1100. [29] LIU J, GAO C, MENG D, et al. DecideNet:counting varying density crowds through attention guided detection and density estimation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:5197-5206. [30] HUANG S, LI X, ZHANG Z, et al. Body structure aware deep crowd counting[J]. IEEE Transactions on Image Processing, 2018, 27(3):1049-1059.