基于多尺度多列卷积神经网络的密集人群计数模型

doi:10.11772/j.issn.1001-9081.2019081437

计算机应用 ›› 2019, Vol. 39 ›› Issue (12): 3445-3449.DOI: 10.11772/j.issn.1001-9081.2019081437

• 第十七届中国机器学习会议(CCML 2019)论文 • 上一篇下一篇

基于多尺度多列卷积神经网络的密集人群计数模型

陆金刚¹, 张莉^1,2

1. 苏州大学计算机科学与技术学院, 江苏苏州 215006;
2. 江苏省计算机信息处理技术重点实验室(苏州大学), 江苏苏州 215006

收稿日期:2019-04-29 修回日期:2019-08-14 发布日期:2019-09-04 出版日期:2019-12-10
作者简介:陆金刚(1993-),男,江苏南通人,硕士研究生,主要研究方向:密集人群计数、深度学习、机器学习;张莉(1975-),女,江苏张家港人,教授,博士,CCF会员,主要研究方向:机器学习、模式识别。
基金资助:
江苏省"六大人才高峰"高层次人才项目（XYDXX-054）。

Crowd counting model based on multi-scale multi-column convolutional neural network

LU Jingang¹, ZHANG Li^1,2

1. School of Computer Science and Technology, Soochow University, Suzhou Jiangsu 215006, China;
2. Jiangsu Provincial Key Laboratory for Computer Information Processing Technology(Soochow University), Suzhou Jiangsu 215006, China

Received:2019-04-29 Revised:2019-08-14 Online:2019-09-04 Published:2019-12-10
Contact: 张莉
Supported by:
The work is partially supported by the Six Talent Peak Project of Jiangsu Province (XYDXX-054).

摘要/Abstract

摘要： 针对尺度和视角变化导致的监控视频和图像中的人数估计性能差的问题，提出了一种基于多尺度多列卷积神经网络（MsMCNN）的密集人群计数模型。在使用MsMCNN进行特征提取之前，使用高斯滤波器对数据集进行处理得到图像的真实密度图，并且对数据集进行数据增强。MsMCNN以多列卷积神经网络的结构为主干，首先从具有多尺度的多个列中提取特征图；然后，用MsMCNN在同一列上连接具有相同分辨率的特征图，以生成图像的估计密度图；最后，对估计密度图进行积分来完成人群计数的任务。为了验证所提模型的有效性，在Shanghaitech数据集和UCF_CC_50数据集上进行了实验，与经典模型Crowdnet、多列卷积神经网络（MCNN）、级联多任务学习（CMTL）方法、尺度自适应卷积神经网络（SaCNN）相比，所提模型在Shanghaitech数据集Part_A和UCF_CC_50数据集上平均绝对误差（MAE）分别至少减小了10.6和24.5，均方误差（MSE）分别至少减小了1.8和29.3；在Shanghaitech数据集Part_B上也取得了较好的结果。MsMCNN更注重特征提取过程中的浅层特征的结合以及多尺度特征的结合，可以有效减少尺度和视角变化带来的精确度偏低的影响，提升人群计数的性能。

关键词: 密集人群计数, 密度图, 卷积神经网络, 多尺度, 尺度和视角变化

Abstract: To improve the bad performance of crowd counting in surveillance videos and images caused by the scale and perspective variation, a crowd counting model, named Multi-scale Multi-column Convolutional Neural Network (MsMCNN) was proposed. Before extracting features with MsMCNN, the dataset was processed with the Gaussian filter to obtain the true density maps of images, and the data augmentation was performed. With the structure of multi-column convolutional neural network as the backbone, MsMCNN firstly extracted feature maps from multiple columns with multiple scales. Then, MsMCNN was used to generate the estimated density map by combining feature maps with the same resolution in the same column. Finally, crowd counting was realized by integrating the estimated density map. To verify the effectiveness of the proposed model, experiments were conducted on Shanghaitech and UCF_CC_50 datasets. Compared to the classic methods:Crowdnet, Multi-column Convolutional Neural Network (MCNN), Cascaded Multi-Task Learning (CMTL) and Scale-adaptive Convolutional Neural Network (SaCNN), the Mean Absolute Error (MAE) of MsMCNN respectively decreases 10.6 and 24.5 at least on Part_A and UCF_CC_50 of Shanghaitech dataset, and the Mean Squared Error (MSE) of MsMCNN respectively decreases 1.8 and 29.3 at least. Furthermore, MsMCNN also achieves the better result on the Part_B of the Shanghaitech dataset. MsMCNN pays more attention to the combination of shallow features and the combination of multi-scale features in the feature extraction process, which can effectively reduce the impact of low accuracy caused by scale and perspective variation, and improve the performance of crowd counting.

Key words: crowd counting, density map, Convolutional Neural Network (CNN), multi-scale, perspective and scale variation

中图分类号:

陆金刚, 张莉. 基于多尺度多列卷积神经网络的密集人群计数模型[J]. 计算机应用, 2019, 39(12): 3445-3449.

LU Jingang, ZHANG Li. Crowd counting model based on multi-scale multi-column convolutional neural network[J]. Journal of Computer Applications, 2019, 39(12): 3445-3449.

参考文献

[1] CHAN A B, LIANG Z S J, VASCONCELOS N. Privacy preserving crowd monitoring:counting people without people models or tracking[C]//Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2008:1-7.
[2] WANG M, WANG X. Automatic adaptation of a generic pedestrian detector to a specific traffic scene[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2011:3401-3408.
[3] WU B, NEVATIA R. Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors[C]//Proceedings of the 2005 10th IEEE International Conference on Computer Vision. Piscataway:IEEE, 2005:90-97.
[4] STEWART R, ANDRILUKA M, NG A Y. End-to-end people detection in crowded scenes[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:2325-2333.
[5] TOPKAYA I S, ERDOGAN H, PORIKLI F. Counting people by clustering person detector outputs[C]//Proceedings of the 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway:IEEE, 2014:313-318.
[6] LEIBE B, SEEMANN E, SCHIELE B. Pedestrian detection in crowded scenes[C]//Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2005:878-885.
[7] CHAN A B, VASCONCELOS N. Bayesian poisson regression for crowd counting[C]//Proceedings of the 2009 12th IEEE International Conference on Computer Vision. Piscataway:IEEE, 2009:545-551.
[8] RYAN D, DENMAN S, FOOKES C. Crowd counting using multiple local features[C]//Proceedings of the 2009 Digital Image Computing:Techniques and Applications. Piscataway:IEEE, 2009:81-88.
[9] IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2013:2547-2554.
[10] LIU B, VASCONCELOS N. Bayesian model adaptation for crowd counts[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:4175-4183.
[11] LEMPITSKY V, ZISSERMAN A. Learning to count objects in images[C]//Proceedings of the 2010 24th Annual Conference on Neural Information Processing Systems. New York:Curran Associates, 2010:1324-1332.
[12] CHEN K, LOY C C, GONG S, et al. Feature mining for localised crowd counting[C]//Proceedings of the 2012 British Machine Vision Conference. Durham:BMVA Press, 2012:1-11.
[13] KONG D, GRAY D, TAO H. Counting pedestrians in crowds using viewpoint invariant training[C]//Proceedings of the 2005 British Machine Vision Conference. Durham:BMVA Press, 2005:1-10.
[14] ZHANG C, LI H, WANG X, et al. Cross-scene crowd counting via deep convolutional neural networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:833-841.
[15] BOOMINATHAN L, KRUTHIVENTI S S, BABU R V. Crowdnet:a deep convolutional network for dense crowd counting[C]//Proceedings of the 24th ACM International Conference on Multimedia. New York:ACM, 2016:640-644.
[16] ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:589-597.
[17] SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting[C]//Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway:IEEE, 2017:1-6.
[18] ZHANG L, SHI M, CHEN Q. Crowd counting via scale-adaptive convolutional neural network[C]//Proceedings of the 2018 IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway:IEEE, 2018:1113-1121.
[19] 郭继昌,李翔鹏.基于卷积神经网络和密度分布特征的人数统计方法[J].电子科技大学学报,2018,47(6):806-813.(GUO J C, LI X P. A crowd counting method based on convolutional neural networks and density distribution features[J]. Journal of University of Electronic Science and Technology of China, 2018, 47(6):806-813.)
[20] 唐清,王知衍,严和平.基于模糊神经网络的大场景人群密度估计方法[J].计算机应用研究,2010,27(3):989-991,1008.(TANG Q, WANG Z Y, YAN H P. Crowd density estimation of wide scene based on fuzzy neural network[J]. Application Research of Computers, 2010, 27(3):989-991, 1008.)
[21] 谭智勇,袁家政,刘宏哲.基于深度卷积神经网络的人群密度估计方法[J].计算机应用与软件,2017,34(7):130-136.(TAN Z Y, YUAN J Z, LIU H Z. Crowd density estimation method based on deep convolutional neural networks[J]. Computer Applications and Software, 2017, 34(7):130-136.)

基于多尺度多列卷积神经网络的密集人群计数模型

Crowd counting model based on multi-scale multi-column convolutional neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[2]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[3]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[4]	戎妍, 刘嘉雯, 李馨蕾. 面向学生课堂情感计算的自适应混合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2919-2930.
[5]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[6]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[7]	陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413.
[8]	李晨倩, 刘俊. 基于半监督和多尺度级联注意力的超声颈动脉斑块分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2604-2610.
[9]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[10]	唐媛, 陈艳平, 扈应, 黄瑞章, 秦永彬. 基于多尺度混合注意力卷积神经网络的关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2011-2017.
[11]	施赛龙, 方智文. 基于多尺度聚合和共享注意力的注视估计模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2047-2054.
[12]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[13]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[14]	李伟, 张晓蓉, 陈鹏, 李清, 张长青. 基于正态逆伽马分布的多尺度融合人群计数算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2243-2249.
[15]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.