基于多尺度多任务卷积神经网络的人群计数

doi:10.11772/j.issn.1001-9081.2018051132

计算机应用 ›› 2019, Vol. 39 ›› Issue (1): 199-204.DOI: 10.11772/j.issn.1001-9081.2018051132

基于多尺度多任务卷积神经网络的人群计数

曹金梦¹, 倪蓉蓉², 杨彪¹

1. 常州大学信息科学与工程学院, 江苏常州 213164;
2. 常州纺织服装职业技术学院能源管理科, 江苏常州 213164

收稿日期:2018-06-01 修回日期:2018-08-03 出版日期:2019-01-10 发布日期:2019-01-21
通讯作者: 杨彪
作者简介:曹金梦(1994-),女,江苏江阴人,硕士研究生,CCF会员,主要研究方向:深度学习、模式识别;倪蓉蓉(1987-),女,江苏南通人,硕士,主要研究方向:深度学习、模式识别;杨彪(1987-),男,江苏常州人,讲师,博士,CCF会员,主要研究方向:深度学习、模式识别。
基金资助:
国家自然科学基金资助项目（61501060）；江苏省自然科学基金资助项目（BK20150271）；江苏省道路载运工具新技术应用重点实验室开放课题项目（BM20082061708）。

Crowd counting using multi-scale multi-task convolutional neural network

CAO Jinmeng¹, NI Rongrong², YANG Biao¹

1. School of Information Science & Engineering, Changzhou University, Changzhou Jiangsu 213164, China;
2. Department of Energy Management, Changzhou Vocational Institute of Textile and Garment, Changzhou Jiangsu 213164, China

Received:2018-06-01 Revised:2018-08-03 Online:2019-01-10 Published:2019-01-21
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61501060), the Natural Science Foundation of Jiangsu Province (BK20150271), the Key Laboratory for New Technology Application of Road Conveyance of Jiangsu Province (BM20082061708).

摘要/Abstract

摘要： 在智能监控领域，实现人群计数具有重要价值，针对人群尺度不一、人群密度分布不均及遮挡等问题，提出一种多尺度多任务卷积神经网络（MMCNN）进行人群计数的方法。首先提出一种新颖的自适应人形核生成密度图描述人群信息，消除人群遮挡影响；其次通过构建多尺度卷积神经网络解决人群尺度不一问题，以多任务学习机制同时估计密度图及人群密度等级，解决人群分布不均问题；最后设计一种加权损失函数，提高人群计数准确率。在UCF_CC_50和World Expo'10数据库上进行了评估，验证了自适应人形核的有效性。实验结果表明：所提算法比Sindagi等的方法（SINDAGI V A，PATEL V M.CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting.Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance.Piscataway，NJ：IEEE，2017：1-6）在UCF_CC_50数据库上平均绝对误差（MAE）数值和均方误差（MSE）数值分别降低约1.7和45；与Zhang等的方法（ZHANG Y，ZHOU D，CHEN S，et al.Single-image crowd counting via multi-column convolutional neural network.Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington，DC：IEEE Computer Society，2016：589-597）相比，在World Expo'10数据库上所提算法的MAE值降低约1.5，且在真实公共汽车数据库上仅0~3人的计数误差，表明其实用性较强。

关键词: 人群计数, 多尺度, 多任务学习, 卷积神经网络, 自适应人形核, 加权损失函数

Abstract: Crowd counting has played a significant role in the field of intelligent surveillance. Concerning the problem of scale variation, non-uniform density distribution and partial occlusion of crowds, a method of crowd counting using Multi-scale Multi-task Convolutional Neural Network (MMCNN) was proposed to solve existing challenges in crowd counting. Initially, a novel adaptive human-shaped kernel was used to generate a density map which described the population information, and the partial occlusion was eliminated. Then, scale variation was handled through constructing a multi-scale convolutional neural network and non-uniform density distribution was resolved by the multi-task learning mechanism, which simultaneously estimate the density map and density level of crowds. Further, a weighted loss function was proposed to improve the accuracy of crowd counting. Evaluations in UCF_CC_50 and World Expo'10 datasets revealed the effectiveness of the proposed adaptive human-shaped kernel. The experimental results show that, compared with the method proposed by Sindagi et al. (SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway, NJ:IEEE, 2017:1-6), the Mean Absolute Error (MAE) and Mean Squared Error (MSE) of the proposed method in UCF_CC_50 dataset is decreased by 1.7 and 45 respectively. Compared with the method proposed by Zhang et al. (ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:589-597), the MAE of the proposed method in World Expo'10 dataset is decreased by 1.5. Simultaneously, evaluations in practical bus videos with an error of approximately 0-3, which verifies the practicability of the proposed counting approach.

Key words: crowd counting, multi-scale, multi-task learning, Convolutional Neural Network (CNN), adaptive human-shaped kernel, weighted loss function

中图分类号:

TP391.4
TP18

曹金梦, 倪蓉蓉, 杨彪. 基于多尺度多任务卷积神经网络的人群计数[J]. 计算机应用, 2019, 39(1): 199-204.

CAO Jinmeng, NI Rongrong, YANG Biao. Crowd counting using multi-scale multi-task convolutional neural network[J]. Journal of Computer Applications, 2019, 39(1): 199-204.

参考文献

[1] RYAN D, DENMAN S, SRIDHARAN S, et al. An evaluation of crowd counting methods, features and regression models[J]. Computer Vision and Image Understanding, 2015, 130(C):1-17.
[2] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9):1627-1645.
[3] GAO C, LIU J, FENG Q, et al. People-flow counting in complex environments by combining depth and color information[J]. Multimedia Tools and Applications, 2016, 75(15):9315-9331.
[4] LUO J, WANG J, XU H, et al. Real-time people counting for indoor scenes[J]. Signal Processing, 2016, 124:27-35.
[5] ANTIC B, LETIC D, CULIBRK D, et al. K-means based segmentation for real-time zenithal people counting[C]//Proceedings of the 200916th IEEE International Conference on Image Processing. Piscataway, NJ:IEEE, 2009:2565-2568.
[6] RAO A S, GUBBI J, MARUSIC S, et al. Estimation of crowd density by clustering motion cues[J]. The Visual Computer, 2015, 31(11):1533-1552.
[7] CHAN A B, LIANG Z S J, VASCONCELOS N. Privacy preserving crowd monitoring:counting people without people models or tracking[C]//Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2008:1-7.
[8] 姬丽娜,陈庆奎,陈圆金,等.基于GPU的视频流人群实时计数[J].计算机应用,2017,37(1):145-152.(JI L N, CHEN Q K, CHEN Y J, et al. Real-time crowd counting method from video stream based on GPU[J]. Journal of Computer Applications, 2017, 37(1):145-152.)
[9] HASHEMZADEH M, FARAJZADEH N. Combining keypoint-based and segment-based features for counting people in crowded scenes[J]. Information Sciences, 2016, 345:199-216.
[10] SIVA P, SHAFIEE M J, JAMIESON M, et al. Scene invariant crowd segmentation and counting using scale-normalized Histogram of Moving Gradients (HoMG)[J]. ArXiv Preprint, 2016, 2016:1602.00386.
[11] ZHANG C, LI H, WANG X, et al. Cross-scene crowd counting via deep convolutional neural networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:833-841.
[12] OÑORO-RUBIO D, LÓPEZ-SASTRE R J. Towards perspective-free object counting with deep learning[C]//Proceedings of the 2016 European Conference on Computer Vision. Berlin:Springer, 2016:615-629.
[13] HU Y, CHANG H, NIAN F, et al. Dense crowd counting from still images with convolutional neural networks[J]. Journal of Visual Communication and Image Representation, 2016, 38:530-539.
[14] SHENG B, SHEN C, LIN G, et al. Crowd counting via weighted VLAD on dense attribute feature maps[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 28(8):1788-1797.
[15] KANG D, DHAR D, CHAN A B. Crowd counting by adapting convolutional neural networks with side information[J]. ArXiv Preprint, 2016, 2016:1611.06748.
[16] 时增林,叶阳东,吴云鹏,等.基于序的空间金字塔池化网络的人群计数方法[J].自动化学报,2016,42(6):866-874.(SHI Z L, YE Y D, WU Y P, et al. Crowd counting using rank-based spatial pyramid pooling network[J]. Acta Automatica Sinica, 2016, 42(6):866-874.)
[17] ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:589-597.
[18] SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting[C]//Proceedings of the 201714th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway, NJ:IEEE, 2017:1-6.
[19] MARSDEN M, MCGUINNESS K, LITTLE S, et al. ResnetCrowd:a residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification[C]//Proceedings of the 201714th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway,NJ:IEEE, 2017:1-7.
[20] ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:589-597.
[21] ZEILER M D, RANZATO M, MONGA R, et al. On rectified linear units for speech processing[C]//Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2013:3517-3521.
[22] WANG T, LI G, LEI J, et al. Crowd counting based on MMCNN in still images[C]//Proceedings of the 2017 Scandinavian Conference on Image Analysis. Berlin:Springer, 2017:468-479.
[23] FU M, XU P, LI X, et al. Fast crowd density estimation with convolutional neural networks[J]. Engineering Applications of Artificial Intelligence, 2015, 43:81-88.
[24] IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2013:2547-2554.
[25] KANG D, MA Z, CHAN A B. Beyond counting:comparisons of density maps for crowd analysis tasks-counting, detection, and tracking[J]. IEEE Transactions on Circuits & Systems for Video Technology, 2017, PP(99):1-1.
[26] 覃勋辉,王修飞,周曦,等.多种人群密度场景下的人群计数[J].中国图象图形学报,2013,18(4):392-398.(QIN X H, WANG X F, ZHOU X, et al. Counting people in various crowed density scenes using support vector regression[J]. Journal of Image and Graphics, 2013, 18(4):392-398.)

基于多尺度多任务卷积神经网络的人群计数

Crowd counting using multi-scale multi-task convolutional neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[2]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[3]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[4]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[5]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[6]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[7]	牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.
[8]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[9]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[10]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[11]	黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.
[12]	吴则举, 焦翠娟, 陈亮. 基于改进Faster R-CNN的轮胎缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1939-1946.
[13]	杨粟, 欧阳智, 杜逆索. 基于相关度距离的无监督并行哈希图像检索[J]. 计算机应用, 2021, 41(7): 1902-1907.
[14]	谭道强, 曾诚, 乔金霞, 张俊. 基于混合注意力模型的阴影检测方法[J]. 计算机应用, 2021, 41(7): 2076-2081.
[15]	武光利, 李雷霆, 郭振洲, 王成祥. 基于改进的双向长短期记忆网络的视频摘要生成模型[J]. 计算机应用, 2021, 41(7): 1908-1914.