基于像素级注意力机制的人群计数方法

doi:10.11772/j.issn.1001-9081.2019050920

计算机应用 ›› 2020, Vol. 40 ›› Issue (1): 56-61.DOI: 10.11772/j.issn.1001-9081.2019050920

基于像素级注意力机制的人群计数方法

陈美云, 王必胜, 曹国, 梁永博

南京理工大学计算机科学与工程学院, 南京 210094

收稿日期:2019-06-04 修回日期:2019-08-30 发布日期:2019-09-24 出版日期:2020-01-10
通讯作者: 曹国
作者简介:陈美云(1995-),女,浙江杭州人,硕士研究生,主要研究方向:深度学习、人群计数;王必胜(1994-),男,江苏盐城人,博士研究生,CCF会员,主要研究方向:深度学习、目标检测;曹国(1977-),男,山东济南人,教授,博士,CCF会员,主要研究方向:图像处理、模式识别;梁永博(1995-),男,河南商丘人,硕士研究生,CCF会员,主要研究方向:深度学习、图像处理。
基金资助:
江苏省自然科学基金资助项目（BK20191284）；东阳市中小学校平安校园"慧眼"政府采购项目（DYCG2018-A032）。

Crowd counting method based on pixel-level attention mechanism

CHEN Meiyun, WANG Bisheng, CAO Guo, LIANG Yongbo

College of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing Jiangsu 210094, China

Received:2019-06-04 Revised:2019-08-30 Online:2019-09-24 Published:2020-01-10
Supported by:
This work is partially supported by the Natural Science Foundation of Jiangsu Province (BK20191284), the Dongyang Primary and Secondary School Safe Campus "Smart Eyes" Government Procurement Project (DYCG2018-A032).

摘要/Abstract

摘要： 针对人群分布不均和网络学习参数众多问题，提出了一种由像素级注意力机制（PAM）和改进的单列人群密度估计网络两部分组成的高密度人群计数方法。首先，使用PAM通过对人群图像进行像素级别的分类来生成高质量的局部人群密度图，利用全卷积网络（FCN）生成每个图像的密度掩码，将图像中的像素分为不同的密度级别；然后，以生成的密度掩码为标签，使用单列人群密度估计网络以更少的参数学习到更多的代表性特征。在此之前，在Shanghaitech数据集part_B部分、UCF_CC_50数据集以及WorldExpo'10数据集上，拥塞场景识别网络（CSRNet）方法的计数误差最小。将所提方法与CSRNet方法的误差结果对比，发现所提方法在Shanghaitech数据集part_B部分的平均绝对误差（MAE）和均方误差（MSE）分别降低了8.49%和4.37%；在UCF_CC_50数据集上的MAE和MSE分别降低了58.38%和51.98%，优化效果显著；在WorldExpo'10数据集上的整体平均值部分的MAE降低了1.16%。实验结果表明，在针对人群分布不均的高密度人群计数时，结合PAM和单列人群密度估计网络的方法能够有效提高高密度人群计数的精确度和训练效率。

关键词: 人群分布不均, 像素级注意力机制, 单列人群密度估计网络, 高密度人群, 全卷积网络, 密度掩码

Abstract: In order to solve the problem of uneven distribution of crowd and massive network learning parameters, a method for accurate high-density crowd counting was proposed, which is composed of Pixel-level Attention Mechanism (PAM) and improved single-column crowd density estimation network. First of all, the PAM was used to generate a high-quality local crowd density map by classifying the crowd images at pixel level, and the Full Convolutional Network (FCN) was used to generate the density mask of each image, and the pixels in image were divided into different density levels. Then, using the generated density mask as the label, the single-column crowd density estimation network was used to learn more representative features with fewer parameters. Before this method was proposed, the counting error of Network for Congested Scene Recognition (CSRNet) method was the smallest on part_B of Shanghaitech dataset, the UCF_CC_50 dataset and the WorldExpo'10 dataset. Comparing the error results of proposed method with CSRNet, it is found that this method has the Mean Absolute Error (MAE) and Mean Squared Error (MSE) on part_B of Shanghaitech dataset reduced by 8.49% and 4.37%; the MAE and MSE on UCF_CC_50 dataset decreased by 58.38% and 51.98% respectively, which are of significant optimization, and the MAE of overall average value on the WorldExpo'10 dataset reduced by 1.16%. The experimental results show that when counting the unevenly distributed high-density crowd, the method of combination of PAM and single-column crowd density estimation network can effectively improve the accuracy and training efficiency of high-density crowd counting.

Key words: uneven distribution of crowd, Pixel-level Attention Mechanism (PAM), single-column crowd density estimation network, high-density crowd, Fully Convolutional Network (FCN), density mask

中图分类号:

TP391

陈美云, 王必胜, 曹国, 梁永博. 基于像素级注意力机制的人群计数方法[J]. 计算机应用, 2020, 40(1): 56-61.

CHEN Meiyun, WANG Bisheng, CAO Guo, LIANG Yongbo. Crowd counting method based on pixel-level attention mechanism[J]. Journal of Computer Applications, 2020, 40(1): 56-61.

参考文献

[1] SINDAGI V A, PATEL V M. A survey of recent advances in CNN-based single image crowd counting and density estimation[J]. Pattern Recognition Letters, 2017, 107:3-16.
[2] FU H, MA H, XIAO H. Scene-adaptive accurate and fast vertical crowd counting via joint using depth and color information[J]. Multimedia Tools and Applications, 2014, 73(1):273-289.
[3] LI W, MAHADEVAN V, VASCONCELOS N. Anomaly detection and localization in crowded scenes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(1):18-32.
[4] ZHOU B, TANG X, WANG X. Learning collective crowd behaviors with dynamic pedestrian-agents[J]. International Journal of Computer Vision, 2015, 111(1):50-68.
[5] CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection:a survey[J]. ACM Computing Surveys, 2009, 41(3):No.15.
[6] KE Y, SUKTHANKAR R, HEBERT M. Event detection in crowded videos[C]//Proceedings of the IEEE 11th International Conference on Computer Vision. Piscataway:IEEE, 2007:1-8.
[7] 魏武,张起森,王明俊,等. 基于计算机视觉和图像处理的交通参数检测[J]. 信息与控制, 2001, 30(3):257-261. (WEI W, ZHANG Q S, WANG M J, et al. Detection of traffic parameters based on computer vision and image processing[J]. Information and Control, 2001, 30(3):257-261.)
[8] 张桂铭,朱阿兴,杨胜天,等. 基于核密度估计的动物生境适宜度制图方法[J]. 生态学报, 2013, 33(23):7590-7600. (ZHANG G M, ZHU A X, YANG S T, et al. Mapping wildlife habitat suitability using nuclear density estimation[J]. Acta Ecologica Sinica, 2013, 33(23):7590-7600.)
[9] FRENCH G, FISHER M H, MACKIEWICZ M, et al. Convolutional neural networks for counting fish in fisheries surveillance video[C]//Proceedings of the 26th British Machine Vision Conference. Durham:BMVA Press, 2015:23-32.
[10] HAAR A. Zur Theorie der orthogonalen Funktionensysteme[J]. Mathematische Annalen, 1910, 69(3):331-371.
[11] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2005:886-893.
[12] OJALA T, PIETIKÄINEN M, MÄENPÄÄ T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7):971-987.
[13] SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting[C]//Proceedings of the 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance. Piscataway:IEEE, 2017:1-6.
[14] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90.
[15] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:770-778.
[16] ZHANG C, LI H, WANG X, et al. Cross-scene crowd counting via deep convolutional neural networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:833-841.
[17] ZHANG L, SHI M, CHEN Q. Crowd counting via scale-adaptive convolutional neural network[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway:IEEE, 2018:1113-1121.
[18] SINDAGI V A, PATEL V M. Generating high-quality crowd density maps using contextual pyramid CNNs[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:1879-1888.
[19] ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:589-597.
[20] ZEILER, M D, FERGUS R. Visualizing and understanding convolutional neural networks[C]//Proceedings of the 2014 European Conference on Computer Vision, LNCS 8689. Cham:Springer, 2014:818-833.
[21] KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2014:1725-1732.
[22] SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2017:4031-4039.
[23] ZENG L, XU X, CAI B, et al. Multi-scale convolutional neural networks for crowd counting[C]//Proceedings of the 2017 IEEE International Conference on Image Processing. Piscataway:IEEE, 2017:465-469.
[24] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015, 3431-3440.
[25] 刘明林. 基于深度学习的人群密度估计及稠密人群计数的研究[D].郑州:郑州大学, 2017:1-55. (LIU M L. Research on crowd density estimation and dense population count based on deep learning[D]. Zhengzhou:Zhengzhou University, 2017:1-55.)
[26] WALACH E, WOLF L. Learning to count with CNN boosting[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9906. Cham:Springer, 2016:660-676.
[27] MARSDEN M, MACGUINNESS K, LITTLE S, et al. Fully convolutional crowd counting on highly congested scenes[C]//Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Setúbal:SciTePress, 2017:27-33.
[28] LI Y, ZHANG X, CHEN D. CSRNet:dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:1091-1100.
[29] LIU J, GAO C, MENG D, et al. DecideNet:counting varying density crowds through attention guided detection and density estimation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:5197-5206.
[30] HUANG S, LI X, ZHANG Z, et al. Body structure aware deep crowd counting[J]. IEEE Transactions on Image Processing, 2018, 27(3):1049-1059.

基于像素级注意力机制的人群计数方法

Crowd counting method based on pixel-level attention mechanism

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics

[1]	程凯, 王妍, 刘剑飞. 基于生成对抗网络的自动细胞核分割半监督学习方法[J]. 计算机应用, 2020, 40(10): 2917-2922.
[2]	赵瑞祥, 侯宏花, 张鹏程, 刘祎, 田珠, 桂志国. 结合全卷积网络和K均值聚类的球栅阵列焊球边缘气泡分割[J]. 计算机应用, 2019, 39(9): 2580-2585.
[3]	朱繁, 王洪元, 张继. 基于改进的Mask R-CNN的行人细粒度检测算法[J]. 计算机应用, 2019, 39(11): 3210-3215.
[4]	刘一鸣, 张鹏程, 刘祎, 桂志国. 基于全卷积网络和条件随机场的宫颈癌细胞学图像的细胞核分割[J]. 计算机应用, 2018, 38(11): 3348-3354.
[5]	汤浩, 何楚. 全卷积网络结合改进的条件随机场循环神经网络用于SAR图像场景分类[J]. 计算机应用, 2016, 36(12): 3436-3441.