Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (1): 56-61.DOI: 10.11772/j.issn.1001-9081.2019050920

• Artificial intelligence • Previous Articles     Next Articles

Crowd counting method based on pixel-level attention mechanism

CHEN Meiyun, WANG Bisheng, CAO Guo, LIANG Yongbo   

  1. College of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing Jiangsu 210094, China
  • Received:2019-06-04 Revised:2019-08-30 Online:2020-01-10 Published:2019-09-24
  • Supported by:
    This work is partially supported by the Natural Science Foundation of Jiangsu Province (BK20191284), the Dongyang Primary and Secondary School Safe Campus "Smart Eyes" Government Procurement Project (DYCG2018-A032).

基于像素级注意力机制的人群计数方法

陈美云, 王必胜, 曹国, 梁永博   

  1. 南京理工大学 计算机科学与工程学院, 南京 210094
  • 通讯作者: 曹国
  • 作者简介:陈美云(1995-),女,浙江杭州人,硕士研究生,主要研究方向:深度学习、人群计数;王必胜(1994-),男,江苏盐城人,博士研究生,CCF会员,主要研究方向:深度学习、目标检测;曹国(1977-),男,山东济南人,教授,博士,CCF会员,主要研究方向:图像处理、模式识别;梁永博(1995-),男,河南商丘人,硕士研究生,CCF会员,主要研究方向:深度学习、图像处理。
  • 基金资助:
    江苏省自然科学基金资助项目(BK20191284);东阳市中小学校平安校园"慧眼"政府采购项目(DYCG2018-A032)。

Abstract: In order to solve the problem of uneven distribution of crowd and massive network learning parameters, a method for accurate high-density crowd counting was proposed, which is composed of Pixel-level Attention Mechanism (PAM) and improved single-column crowd density estimation network. First of all, the PAM was used to generate a high-quality local crowd density map by classifying the crowd images at pixel level, and the Full Convolutional Network (FCN) was used to generate the density mask of each image, and the pixels in image were divided into different density levels. Then, using the generated density mask as the label, the single-column crowd density estimation network was used to learn more representative features with fewer parameters. Before this method was proposed, the counting error of Network for Congested Scene Recognition (CSRNet) method was the smallest on part_B of Shanghaitech dataset, the UCF_CC_50 dataset and the WorldExpo'10 dataset. Comparing the error results of proposed method with CSRNet, it is found that this method has the Mean Absolute Error (MAE) and Mean Squared Error (MSE) on part_B of Shanghaitech dataset reduced by 8.49% and 4.37%; the MAE and MSE on UCF_CC_50 dataset decreased by 58.38% and 51.98% respectively, which are of significant optimization, and the MAE of overall average value on the WorldExpo'10 dataset reduced by 1.16%. The experimental results show that when counting the unevenly distributed high-density crowd, the method of combination of PAM and single-column crowd density estimation network can effectively improve the accuracy and training efficiency of high-density crowd counting.

Key words: uneven distribution of crowd, Pixel-level Attention Mechanism (PAM), single-column crowd density estimation network, high-density crowd, Fully Convolutional Network (FCN), density mask

摘要: 针对人群分布不均和网络学习参数众多问题,提出了一种由像素级注意力机制(PAM)和改进的单列人群密度估计网络两部分组成的高密度人群计数方法。首先,使用PAM通过对人群图像进行像素级别的分类来生成高质量的局部人群密度图,利用全卷积网络(FCN)生成每个图像的密度掩码,将图像中的像素分为不同的密度级别;然后,以生成的密度掩码为标签,使用单列人群密度估计网络以更少的参数学习到更多的代表性特征。在此之前,在Shanghaitech数据集part_B部分、UCF_CC_50数据集以及WorldExpo'10数据集上,拥塞场景识别网络(CSRNet)方法的计数误差最小。将所提方法与CSRNet方法的误差结果对比,发现所提方法在Shanghaitech数据集part_B部分的平均绝对误差(MAE)和均方误差(MSE)分别降低了8.49%和4.37%;在UCF_CC_50数据集上的MAE和MSE分别降低了58.38%和51.98%,优化效果显著;在WorldExpo'10数据集上的整体平均值部分的MAE降低了1.16%。实验结果表明,在针对人群分布不均的高密度人群计数时,结合PAM和单列人群密度估计网络的方法能够有效提高高密度人群计数的精确度和训练效率。

关键词: 人群分布不均, 像素级注意力机制, 单列人群密度估计网络, 高密度人群, 全卷积网络, 密度掩码

CLC Number: