计算机应用 ›› 2019, Vol. 39 ›› Issue (12): 3445-3449.DOI: 10.11772/j.issn.1001-9081.2019081437

• 第十七届中国机器学习会议(CCML 2019)论文 • 上一篇    下一篇

基于多尺度多列卷积神经网络的密集人群计数模型

陆金刚1, 张莉1,2   

  1. 1. 苏州大学 计算机科学与技术学院, 江苏 苏州 215006;
    2. 江苏省计算机信息处理技术重点实验室(苏州大学), 江苏 苏州 215006
  • 收稿日期:2019-04-29 修回日期:2019-08-14 出版日期:2019-12-10 发布日期:2019-09-04
  • 作者简介:陆金刚(1993-),男,江苏南通人,硕士研究生,主要研究方向:密集人群计数、深度学习、机器学习;张莉(1975-),女,江苏张家港人,教授,博士,CCF会员,主要研究方向:机器学习、模式识别。
  • 基金资助:
    江苏省"六大人才高峰"高层次人才项目(XYDXX-054)。

Crowd counting model based on multi-scale multi-column convolutional neural network

LU Jingang1, ZHANG Li1,2   

  1. 1. School of Computer Science and Technology, Soochow University, Suzhou Jiangsu 215006, China;
    2. Jiangsu Provincial Key Laboratory for Computer Information Processing Technology(Soochow University), Suzhou Jiangsu 215006, China
  • Received:2019-04-29 Revised:2019-08-14 Online:2019-12-10 Published:2019-09-04
  • Contact: 张莉
  • Supported by:
    The work is partially supported by the Six Talent Peak Project of Jiangsu Province (XYDXX-054).

摘要: 针对尺度和视角变化导致的监控视频和图像中的人数估计性能差的问题,提出了一种基于多尺度多列卷积神经网络(MsMCNN)的密集人群计数模型。在使用MsMCNN进行特征提取之前,使用高斯滤波器对数据集进行处理得到图像的真实密度图,并且对数据集进行数据增强。MsMCNN以多列卷积神经网络的结构为主干,首先从具有多尺度的多个列中提取特征图;然后,用MsMCNN在同一列上连接具有相同分辨率的特征图,以生成图像的估计密度图;最后,对估计密度图进行积分来完成人群计数的任务。为了验证所提模型的有效性,在Shanghaitech数据集和UCF_CC_50数据集上进行了实验,与经典模型Crowdnet、多列卷积神经网络(MCNN)、级联多任务学习(CMTL)方法、尺度自适应卷积神经网络(SaCNN)相比,所提模型在Shanghaitech数据集Part_A和UCF_CC_50数据集上平均绝对误差(MAE)分别至少减小了10.6和24.5,均方误差(MSE)分别至少减小了1.8和29.3;在Shanghaitech数据集Part_B上也取得了较好的结果。MsMCNN更注重特征提取过程中的浅层特征的结合以及多尺度特征的结合,可以有效减少尺度和视角变化带来的精确度偏低的影响,提升人群计数的性能。

关键词: 密集人群计数, 密度图, 卷积神经网络, 多尺度, 尺度和视角变化

Abstract: To improve the bad performance of crowd counting in surveillance videos and images caused by the scale and perspective variation, a crowd counting model, named Multi-scale Multi-column Convolutional Neural Network (MsMCNN) was proposed. Before extracting features with MsMCNN, the dataset was processed with the Gaussian filter to obtain the true density maps of images, and the data augmentation was performed. With the structure of multi-column convolutional neural network as the backbone, MsMCNN firstly extracted feature maps from multiple columns with multiple scales. Then, MsMCNN was used to generate the estimated density map by combining feature maps with the same resolution in the same column. Finally, crowd counting was realized by integrating the estimated density map. To verify the effectiveness of the proposed model, experiments were conducted on Shanghaitech and UCF_CC_50 datasets. Compared to the classic methods:Crowdnet, Multi-column Convolutional Neural Network (MCNN), Cascaded Multi-Task Learning (CMTL) and Scale-adaptive Convolutional Neural Network (SaCNN), the Mean Absolute Error (MAE) of MsMCNN respectively decreases 10.6 and 24.5 at least on Part_A and UCF_CC_50 of Shanghaitech dataset, and the Mean Squared Error (MSE) of MsMCNN respectively decreases 1.8 and 29.3 at least. Furthermore, MsMCNN also achieves the better result on the Part_B of the Shanghaitech dataset. MsMCNN pays more attention to the combination of shallow features and the combination of multi-scale features in the feature extraction process, which can effectively reduce the impact of low accuracy caused by scale and perspective variation, and improve the performance of crowd counting.

Key words: crowd counting, density map, Convolutional Neural Network (CNN), multi-scale, perspective and scale variation

中图分类号: