Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2233-2242.DOI: 10.11772/j.issn.1001-9081.2023070918

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network

Yangyi GAO1,2, Tao LEI1,2(), Xiaogang DU1, Suiyong LI3, Yingbo WANG1, Chongdan MIN1,2   

  1. 1.School of Electronic Information and Artificial Intelligence,Shaanxi University of Science and Technology,Xi’an Shaanxi 710021,China
    2.Shaanxi Joint Laboratory of Artificial Intelligence (Shaanxi University of Science and Technology),Xi’an Shaanxi 710021,China
    3.China Railway First Survey and Design Institute Group Company Limited,Xi’an Shaanxi 710043,China
  • Received:2023-07-11 Revised:2023-09-18 Accepted:2023-09-20 Online:2023-10-26 Published:2024-07-10
  • Contact: Tao LEI
  • About author:GAO Yangyi, born in 1998, M. S. candidate. His research interests include image processing, machine learning.
    LEI Tao, born in 1981, Ph. D., professor. His research interests include artificial intelligence, image processing.
    DU Xiaogang, born in 1985, Ph. D., professor. His research interests include image processing, computer vision, machine learning.
    LI Suiyong, born in 1982, senior engineer. His research interests include communication engineering, machine learning.
    WANG Yingbo, born in 1992, Ph. D., lecturer. His research interests include scene perception, image processing.
    MIN Chongdan, born in 1997, M. S. candidate. Her research interests include artificial intelligence, image processing.
  • Supported by:
    National Natural Science Foundation of China(62271296);Shaanxi Science Fund for Distinguished Young Scholars(2021JC-47)

基于像素距离图和四维动态卷积网络的密集人群计数与定位方法

高阳峄1,2, 雷涛1,2(), 杜晓刚1, 李岁永3, 王营博1, 闵重丹1,2   

  1. 1.陕西科技大学 电子信息与人工智能学院, 西安 710021
    2.陕西省人工智能联合实验室(陕西科技大学), 西安 710021
    3.中铁第一勘察设计院集团有限公司, 西安 710043
  • 通讯作者: 雷涛
  • 作者简介:高阳峄(1998—),男,陕西咸阳人,硕士研究生,主要研究方向:图像处理、机器学习;
    雷涛(1981—),男,陕西渭南人,教授,博士,CCF会员,主要研究方向:人工智能、图像处理;
    杜晓刚(1985—),男,陕西宝鸡人,教授,博士,CCF会员,主要研究方向:图像处理、计算机视觉、机器学习;
    李岁永(1982—),男,陕西咸阳人,高级工程师,主要研究方向:通信工程、机器学习;
    王营博(1992—),男,山东沂蒙人,讲师,博士,主要研究方向:场景感知、图像处理;
    闵重丹(1997—),女,陕西渭南人,硕士研究生,主要研究方向:人工智能、图像处理。
  • 基金资助:
    国家自然科学基金资助项目(62271296);陕西省杰出青年基金资助项目(2021JC-47)

Abstract:

The method of obtaining regression density map based on Convolutional Neural Network (CNN) has become the mainstream method of crowd counting and locating, however, there are still two problems in the existing methods. Firstly, density maps obtained by traditional methods have adhesion and overlap problems in crowded areas, which leads to mistakes in final crowd counting and locating of the network. Secondly, due to weight invariance of conventional convolution, it is difficult to achieve adaptive extraction of image features and to process images with complex background and uneven crowd density distribution. To solve these above problems, a method for counting and locating dense crowds was proposed based on Pixel Distance Map (PDMap) and Four-Dimensional Dynamic Convolutional Network(FDDCNet). Firstly, a new PDMap was defined, which used the spatial distance relationship between pixel level points to enhance the smoothness of pixels around the center point of human head through reverse operation, hence solving the problem of adhesion and overlap in crowded areas. Secondly, an FDDC module was designed to adaptively change the weights of the four-dimensions of convolutions, extract the prior knowledge provided by different views to deal with the challenge of counting and locating difficulties caused by complex scenes and uneven distribution, improving the generalization ability and robustness of the model. Finally, the threshold value was used to filter local uncertain predicted value to further improve the accuracy of counting and locating. On the test set of NWPU-Crowd dataset: in terms of crowd counting, the Mean Absolute Error (MAE) and Mean Squared Error (MSE) of the proposed method were 82.4 and 334.7, respectively, which were 8.7% and 26.9% lower than those of Multi-scale Feature Pyramid Network (MFP-Net); and in terms of crowd locating, The comprehensive evaluation indicators F1 value and precision of the proposed method were 71.2% and 73.6%, respectively, which were 3.0% and 5.9% lower than those of Topological Count (TopoCount). The experimental results show that the proposed method can process dense crowd images with complex background, and achieve high counting accuracy and locating accuracy.

Key words: Convolutional Neural Network (CNN), crowd counting, crowd locating, distance variation, dynamic convolution, local maximum detection

摘要:

基于卷积神经网络(CNN)获得回归密度图的方法已成为人群计数与定位的主流方法,但现有方法仍存在两个问题:首先传统方法获得的密度图在人群密集区域存在粘连和重叠问题,导致网络最终人群计数和定位错误;其次,常规卷积由于其权重不变,无法实现对图像特征的自适应提取,难以处理复杂背景和人群密度分布不均匀的图像。为解决上述问题,提出一种基于像素距离图(PDMap)和四维动态卷积网络(FDDCNet)的密集人群计数与定位方法。首先定义了一种新的PDMap,利用像素级标注点之间的空间距离关系,通过取反操作提高人头中心点周围像素的平滑度,避免人群密集区域的粘连重叠;其次,设计了一种FDDC模块,自适应地改变卷积四个维度的权重,提取不同视图提供的先验知识,应对复杂场景和分布不均匀导致的计数与定位困难,提高网络模型的泛化能力和鲁棒性;最后,采用阈值过滤局部不确定预测值,进一步提高计数与定位的准确性。在NWPU-Crowd数据集的测试集上:在人群计数方面,所提方法的平均绝对误差(MAE)和均方误差(MSE)分别为82.4和334.7,比MFP-Net(Multi-scale Feature Pyramid Network)分别降低了8.7%和26.9%;在人群定位方面,所提方法的综合评价指标F1值和精确率分别为71.2%和73.6%,比TopoCount(Topological Count)方法分别提升了3.0%和5.9%。实验结果表明,所提方法能够处理复杂背景的密集人群图像,取得了更高的计数准确率和定位精准度。

关键词: 卷积神经网络, 人群计数, 人群定位, 距离变化, 动态卷积, 局部极大值检测

CLC Number: