《计算机应用》唯一官方网站

• •    下一篇

基于像素距离图和四维动态卷积网络的密集人群计数与定位方法

高阳峄1,雷涛2,杜晓刚3,李岁永4,王营博1,闵重丹1   

  1. 1. 陕西科技大学电子信息与人工智能学院
    2. 陕西科技大学电气与信息工程学院
    3. 兰州交通大学 电子与信息工程学院,兰州730070
    4. 中铁第一勘察设计院集团有限公司
  • 收稿日期:2023-07-11 修回日期:2023-09-12 发布日期:2023-10-26 出版日期:2023-10-26
  • 通讯作者: 高阳峄

Method for crowd counting and location based on pixel distance map and four-dimensional dynamic convolutional network

  • Received:2023-07-11 Revised:2023-09-12 Online:2023-10-26 Published:2023-10-26

摘要: 基于卷积神经网络(CNN)获得回归密度图的方法已成为人群计数与定位的主流方法,但现有方法仍存在两个问题:首先传统方法获得的密度图在人群密集区域存在粘连和重叠问题,导致网络最终人群计数和定位错误;其次,常规卷积由于其权重不变性,无法实现对图像特征的自适应提取,难以处理复杂背景和人群密度分布不均匀的图像。为解决上述问题,提出一种基于像素距离图(PDMap)和四维动态卷积网络(FDDCNet)的密集人群计数与定位方法。首先定义了一种新的PDMap,利用像素级标注点之间的空间距离关系,通过取反操作增强人头中心点周围像素的平滑度,解决人群密集区域的粘连重叠;其次,设计了一种FDDC模块,自适应地改变卷积四个维度的权重,提取不同视图提供的先验知识,应对复杂场景和分布不均匀导致的计数与定位困难,提高网络模型的泛化能力和鲁棒性;最后,采用阈值过滤局部不确定预测值,进一步提高计数与定位的准确性。在NWPU-Crowd数据集上,所提方法的绝对误差(MAE)和均方误差(MSE)分别为82.4和431.6,与MFP-Net(Multi-scale Feature Pyramid Network)相比,所提方法的MAE和MSE降低了8.7%和5.8%,所提方法的综合评价F1和精确率P分别为71.2%和73.6%,与TopoCount(Topological Count)方法相比,所提方法的综合评价F1和精确率P分别提升了3.0%和5.9%。实验结果表明,所提方法能够处理复杂背景的密集人群图像,实现了更高的计数准确率和定位精准度。

Abstract: The method of obtaining regression density map based on Convolutional Neural Network (CNN) has become the mainstream method of crowd counting and positioning, however, there are still two problems in the existing methods. Firstly, density map obtained by traditional methods has adhesion and overlap problems in crowded areas, which leads to mistakes in final crowd count and location of the network. Secondly, due to weight invariance of conventional convolutional, it is difficult to achieve adaptive extraction of image features and to process images with complex background and uneven population density distribution. To solve these above problems, a method for counting and locating dense crowds was proposed based on Pixel Distance Map (PDMap) and Four-Dimensional Dynamic Convolutional Network (FDDCNet). Firstly, a new anti-european distance map was defined, which used the spatial distance relationship between pixel level points to enhance the smoothness of pixels around the center point of human head through reverse operation, hence solving the problem of adhesion and overlap in densely populated areas. Secondly, a four-dimensional dynamic convolutional algorithm was designed to adaptively change weights in the four-dimensions of convolutional, extract the prior knowledge provided by different views to deal with the challenge of counting and positioning difficulties caused by complex scenes and uneven distribution, which improved the generalization ability and robustness of the model. Finally, the threshold value was used to filter local uncertain predicted value to further improve the accuracy of counting and positioning. On NWPU-Crowd dataset, the Mean Absolute Error (MAE) and Mean Squared Error (MSE) of the proposed method are 82.4 and 431.6, respectively. MAE and MSE of the proposed method are reduced by 8.7% and 5.8%, compared with MFP-Net (Multi-scale Feature Pyramid Network). The comprehensive evaluation F1 and accuracy P of the proposed method are improved by 3.0% and 5.9% to 71.2% and 73.6%, compared with TopoCount (Topological Count). The experimental results show that the proposed method can process dense crowd images with complex background, and achieve higher count accuracy and location accuracy.

中图分类号: