基于双重注意力机制的人群计数方法

doi:10.11772/j.issn.1001-9081.2023091269

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (9): 2886-2892.DOI: 10.11772/j.issn.1001-9081.2023091269

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于双重注意力机制的人群计数方法

赵志强¹^,²(), 马培红¹, 黑新宏¹^,²

^1.西安理工大学计算机科学与工程学院，西安 710048
^2.陕西省网络计算与安全技术重点实验室（西安理工大学），西安 710048

收稿日期:2023-09-18 修回日期:2023-12-12 接受日期:2023-12-15 发布日期:2024-03-21 出版日期:2024-09-10
通讯作者: 赵志强
作者简介:马培红（1998—），女，河南郑州人，硕士研究生，主要研究方向：计算机视觉
黑新宏（1976—），男，陕西延安人，教授，博士，CCF杰出会员，主要研究方向：机器学习。
基金资助:
国家自然科学基金资助项目(61976177);陕西省重点研发计划项目(2023-YBGY-222)

Crowd counting method based on dual attention mechanism

Zhiqiang ZHAO¹^,²(), Peihong MA¹, Xinhong HEI¹^,²

^1.School of Computer Science and Engineering，Xi’an University of Technology，Xi’an Shaanxi 710048，China
^2.Shaanxi Key Laboratory of Network Computing and Security Technology （Xi’an University of Technology），Xi’an Shaanxi 710048，China

Received:2023-09-18 Revised:2023-12-12 Accepted:2023-12-15 Online:2024-03-21 Published:2024-09-10
Contact: Zhiqiang ZHAO
About author:MA Peihong， born in 1998， M. S. candidate. Her research interests include computer vision.
HEI Xinhong， born in 1976， Ph. D.， professor. His research interests include machine learning.
Supported by:
National Natural Science Foundation of China(61976177);Key R&D Program of Shaanxi Province(2023-YBGY-222)

摘要/Abstract

摘要：

针对复杂场景下人群计数问题中的尺度变化、背景干扰和部分遮挡等问题，在空洞卷积操作的基础上，提出一种基于双重注意力机制的空洞上下文卷积神经网络（DA-DCCNN）。首先，将VGG16中的卷积层作为特征提取器，获取人群图像抽象、深层的特征图；其次，利用空洞卷积构造空洞上下文模块（DCM）对不同层获取的特征进行连接，并引入空间注意力模块（SAM）和通道注意力模块（CAM）获取上下文信息；最后，组合欧氏距离和交叉熵构造损失函数，对网络预测注意力图和真实注意力图之间的差异进行度量。在ShanghaiTech、UCF_CC_50和UCF-QNRF 3个公开数据集上的实验结果表明，DA-DCCNN在有效获取图像的多尺度特征的同时，增强了对图像中重要区域和通道的感知能力，平均绝对误差（MAE）取得了相对最优的结果。基于双重注意力机制的特征融合网络能有效感知图像中的空间结构和局部特征，从而使得生成的密度图能更准确地对人群区域进行预测和计数。

关键词: 空洞卷积, 上下文特征, 双重注意力机制, 密度图, 人群计数

Abstract:

In response to challenges such as scale variation， background interference， and partial occlusion in crowd counting within complex scenes， a DA-DCCNN （Dual Attention based Dilated Contextual Convolutional Neural Network） was proposed. Firstly， the convolutional layers from VGG16 were utilized as feature extractors to obtain abstract and deep-level feature maps of the crowd image. Subsequently， by employing dilated convolutions， a Dilated Context Module （DCM） was constructed to connect features obtained from different layers. The Spatial Attention Module （SAM） and Channel Attention Module （CAM） were introduced to acquire contextual information. Finally， a loss function was formulated by combining the Euclidean distance and cross entropy to measure the disparity between the predicted attention map and the ground truth attention map. Experimental results on three publicly available datasets — ShanghaiTech， UCF_CC_50 and UCF-QNRF demonstrate that DA-DCCNN can effectively capture multi-scale features in the image and enhance the perception of important regions and channels within the image， achieving the optimal Mean Absolute Error （MAE）. The feature fusion network based on dual attention mechanism can efficiently recognize spatial structures and local features in images so that by using the generated density maps， the crowd regions can be predicted and counted more accurately.

Key words: dilated convolution, contextual feature, dual attention mechanism, density map, crowd counting

中图分类号:

TP183

赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 计算机应用, 2024, 44(9): 2886-2892.

Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism[J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.

图/表 11

参考文献 37

1	余鹰，朱慧琳，钱进，等. 基于深度学习的人群计数研究综述［J］. 计算机研究与发展， 2021， 58（12）：2724-2747.
	YU Y， ZHU H L， QIAN J， et al. Survey on deep learning based crowd counting ［J］. Journal of Computer Research and Development， 2021， 58（12）： 2724-2747.
2	KHAN M A， MENOUAR H， HAMILA R. Revisiting crowd counting： state-of-the-art， trends， and future perspectives ［J］. Image and Vision Computing， 2023， 129： 104597.
3	覃勋辉，王修飞，周曦，等. 多种人群密度场景下的人群计数［J］. 中国图象图形学报， 2013， 18（4）： 392-398.
	QIN X H， WANG X F， ZHOU X， et al. Counting people in various crowed density scenes using support vector regression ［J］. Journal of Image and Graphics， 2013， 18（4）： 392-398.
4	RYAN D， DENMAN S， SRIDHARAN S， et al. An evaluation of crowd counting methods， features and regression models ［J］. Computer Vision and Image Understanding， 2015， 130： 1-17.
5	KOK V J， LIM M K， CHAN C S. Crowd behavior analysis： a review where physics meets biology ［J］. Neurocomputing， 2016， 177： 342-362.
6	POUYANFAR S， SADIQ S， YAN Y， et al. A survey on deep learning： algorithms， techniques， and applications ［J］. ACM Computing Surveys， 2018， 51（5）： No. 92.
7	ALZUBAIDI L， ZHANG J， HUMAIDI A J， et al. Review of deep learning： concepts， CNN architectures， challenges， applications， future directions ［J］. Journal of Big Data， 2021， 8（1）： No. 53.
8	PTAK B， PIECZYŃSKI D， PIECHOCKI M， et al. On-board crowd counting and density estimation using low altitude unmanned aerial vehicles： looking beyond beating the benchmark ［J］. Remote Sensing， 2022， 14（10）： 2288.
9	DELUSSU R， PUTZU L， FUMERA G. Scene-specific crowd counting using synthetic training images ［J］. Pattern Recognition， 2022， 124： 108484.
10	FU M， XU P， LI X， et al. Fast crowd density estimation with convolutional neural networks ［J］. Engineering Applications of Artificial Intelligence， 2015， 43： 81-88.
11	STEWART R， ANDRILUKA M， NG A Y. End-to-end people detection in crowded scenes ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2325-2333.
12	LI W， LI H， QU Q， et al. HeadNet： an end-to-end adaptive relational network for head detection ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2020， 30（2）： 482-494.
13	ZHANG Y， ZHOU D， CHEN S， et al. Single-image crowd counting via multi-column convolutional neural network ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 589-597.
14	LI Y， ZHANG X， CHEN D. CSRNet： dilated convolutional neural networks for understanding the highly congested scenes［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1091-1100.
15	CAO X， WANG Z， ZHAO Y， et al. Scale aggregation network for accurate and efficient crowd counting ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 757-773.
16	VARIOR R R， SHUAI B， TIGHE J， et al. Multi-scale attention network for crowd counting ［EB/OL］. ［2023-02-19］. .
17	SINSAGI V A， PATEL V M. HA-CNN： hierarchical attention-based crowd counting network ［J］. IEEE Transactions on Image Processing， 2019， 29： 323-335.
18	LIU N， LONG Y， ZOU C， et al. ADCrowdNet： an attention-injective deformable convolutional network for crowd understanding［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3220-3229.
19	SHANG C， AI H， BAI B. End-to-end crowd counting via joint learning local and global count ［C］// Proceedings of the 2016 IEEE International Conference on Image Processing. Piscataway： IEEE， 2016： 1215-1219.
20	LIU W， SALZMANN M， FUA P. Context-aware crowd counting［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5094-5103.
21	SHI M， YANG Z， XU C， et al. Revisiting perspective information for efficient crowd counting ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7271-7280.
22	YAN Z， ZHANG R， ZHANG H， et al. Crowd counting via perspective-guided fractional-dilation convolution ［J］. IEEE Transactions on Multimedia， 2021， 24： 2633-2647.
23	WANG X， LV R， ZHAO Y， et al. Multi-scale context aggregation network with attention-guided for crowd counting ［C］// Proceedings of the 2020 15th IEEE International Conference on Signal Processing. Piscataway： IEEE， 2020， 1： 240-245.
24	ZHANG C， LI H， WANG X， et al. Cross-scene crowd counting via deep convolutional neural networks ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 833-841.
25	IDREES H， TAYYAB M， ATHREY K， et al. Composition loss for counting， density map estimation and localization in dense crowds ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 544-559.
26	GAO J， WANG Q， LI X. PCC Net： perspective crowd counting via spatial convolutional network ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2020， 30（10）： 3846-3498.
27	SAM D B， PERI S V， SUNDARARAMAN M N， et al. Locate， size and count： accurately resolving people in dense crowds via detection ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（8）： 2739-2751.
28	YANG Y， LI G， DU D， et al. Embedding perspective analysis into multi-column convolutional neural network for crow counting ［J］. IEEE Transactions on Image Processing， 2020， 30： 1395-1407.
29	LIU Y， GAO G， SHI H， et al. Lw-Count： an effective lightweight encoding-decoding crowd counting network ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（10）： 6821-6834.
30	SINDAGI V A， YASARLA R， PATEL V M. JHU-CROWD++： large-scale crowd counting dataset and a benchmark method ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（5）： 2594-2609.
31	SHEN Z， XU Y， NI B， et al. Crowd counting via adversarial cross-scale consistency pursuit ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 5245-5254.
32	ZHANG Y， ZHOU C， CHANG F， et al. Multi-resolution attention convolutional neural network for crowd counting ［J］. Neurocomputing， 2019， 329： 144-152.
33	SINDAGI V A， PATEL V M. Multi-level bottom-top and top-bottom feature fusion for crowd counting ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1002-1012.
34	WANG F， SANG J， WU Z， et al. Hybrid attention network based on progressive embedding scale-context for crowd counting ［J］. Information Sciences， 2022， 591： 306-318.
35	ZHANG A， SHEN J， XIAO Z， et al. Relational attention network for crowd counting ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6787-6796.
36	KHAN S D， BASALAMAH S. Sparse to dense scale prediction for crowd counting in high density crowds ［J］. Arabian Journal of Science and Engineering， 2021， 46： 3051-3065.
37	WAN J， WANG Q， CHAN A B. Kernel-based density map generation for dense object counting ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（3）： 1357-1370.

数据集	样本数	标注点数	分辨率	人群规模
ShanghaiTech	1 198	330 165	768×1 024	9~578
UCF_CC_50	50	63 075	2 101×2 888	94~4 542
UCF-QNRF	1 535	>125×10⁴	2 013×2 902	49~12 865

数据集	样本数	标注点数	分辨率	人群规模
ShanghaiTech	1 198	330 165	768×1 024	9~578
UCF_CC_50	50	63 075	2 101×2 888	94~4 542
UCF-QNRF	1 535	>125×10⁴	2 013×2 902	49~12 865

数据集	方法	MAE	RMSE
ShanghaiTech	HA-CNN	62.9	94.9
	ADCrowdNet	55.4	97.9
	PCC Net	73.5	124.0
	LSC-CNN	66.4	117.0
	EPA	60.9	91.6
	Lw-Count	69.7	100.5
	CG-DRCN	60.2	94.0
	DA-DCCNN	49.6	87.1
UCF_CC_50	ACSCP	291.0	404.6
	MRA-CNN	240.8	352.6
	RANet	239.8	319.4
	MBTTBF-SCFB	233.1	300.9
	PCC Net	240.0	315.5
	LSC-CNN	225.6	302.7
	SDS-CNN	229.4	325.6
	EPA	250.1	342.7
	Lw-Count	239.3	307.6
	HANet	195.2	268.6
	DA-DCCNN	165.3	227.7
UCF-QNRF	RANet	111.0	190.0
	MBTTBF-SCFB	97.5	165.2
	PCC Net	246.4	247.1
	LSC-CNN	120.5	218.2
	KDMG	99.5	173.0
	SDS-CNN	115.2	175.7
	Lw-Count	149.7	238.4
	HANet	99.1	159.2
	DA-DCCNN	93.3	160.2

数据集	方法	MAE	RMSE
ShanghaiTech	HA-CNN	62.9	94.9
	ADCrowdNet	55.4	97.9
	PCC Net	73.5	124.0
	LSC-CNN	66.4	117.0
	EPA	60.9	91.6
	Lw-Count	69.7	100.5
	CG-DRCN	60.2	94.0
	DA-DCCNN	49.6	87.1
UCF_CC_50	ACSCP	291.0	404.6
	MRA-CNN	240.8	352.6
	RANet	239.8	319.4
	MBTTBF-SCFB	233.1	300.9
	PCC Net	240.0	315.5
	LSC-CNN	225.6	302.7
	SDS-CNN	229.4	325.6
	EPA	250.1	342.7
	Lw-Count	239.3	307.6
	HANet	195.2	268.6
	DA-DCCNN	165.3	227.7
UCF-QNRF	RANet	111.0	190.0
	MBTTBF-SCFB	97.5	165.2
	PCC Net	246.4	247.1
	LSC-CNN	120.5	218.2
	KDMG	99.5	173.0
	SDS-CNN	115.2	175.7
	Lw-Count	149.7	238.4
	HANet	99.1	159.2
	DA-DCCNN	93.3	160.2

组合序号	方法	MAE	RMSE
①	VGG	81.2	119.4
②	VGG+DCM	60.7	95.4
③	VGG+CAM	79.0	114.9
④	VGG+SAM	75.3	109.7
⑤	VGG+DAM	69.1	98.1
⑥	VGG+DCM+CAM	58.3	92.1
⑦	VGG+DCM+SAM	53.2	90.8
⑧	DA-DCCNN	49.6	87.1

基于双重注意力机制的人群计数方法

Crowd counting method based on dual attention mechanism

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 37

相关文章 15

编辑推荐

Metrics

σ	MAE	RMSE
0.001	73.4	112.6
0.010	56.4	96.2
0.100	52.8	90.1
0.200	49.6	87.1
0.300	51.3	90.7

t	MAE	RMSE
0.000 1	54.6	97.3
0.001 0	49.6	87.1
0.005 0	53.8	89.5
0.010 0	57.7	96.4
0.100 0	58.3	95.6

[1]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[2]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[3]	李伟, 张晓蓉, 陈鹏, 李清, 张长青. 基于正态逆伽马分布的多尺度融合人群计数算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2243-2249.
[4]	封筠, 毕健康, 霍一儒, 李家宽. 轻量化沥青路面裂缝图像分割网络PIPNet[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1520-1526.
[5]	蒋占军, 吴佰靖, 马龙, 廉敬. 多尺度特征和极化自注意力的Faster-RCNN水漂垃圾识别[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 938-944.
[6]	王林, 刘景亮, 王无为. 基于空洞卷积融合Transformer的无人机图像小目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3595-3602.
[7]	梁美佳, 刘昕武, 胡晓鹏. 基于改进YOLOv3的列车运行环境图像小目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2611-2618.
[8]	刘辉, 张琳玉, 王复港, 何如瑾. 基于注意力机制和上下文信息的目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1557-1564.
[9]	李佳东, 张丹普, 范亚琼, 杨剑锋. 基于改进YOLOv5的轻量级船舶目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 923-929.
[10]	张志昂, 廖光忠. 基于U-Net的多尺度特征增强视网膜血管分割算法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3275-3281.
[11]	余晓鹏, 何儒汉, 黄晋, 张俊杰, 胡新荣. 基于改进Inception结构的知识图谱嵌入模型[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1065-1071.
[12]	刘朋伟, 高媛, 秦品乐, 殷喆, 王丽芳. 基于多感受野的生成对抗网络医学MRI影像超分辨率重建[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 938-945.
[13]	陈薪羽, 刘明哲, 任俊, 汤影. 基于多列卷积神经网络的参数异步更新算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 395-403.
[14]	许慧青, 陈斌, 王敬飞, 陈志毅, 覃健. 基于卷积神经网络的细长路面病害检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 265-272.
[15]	冯兴杰, 张天泽. 基于分组卷积进行特征融合的全景分割算法[J]. 计算机应用, 2021, 41(7): 2054-2061.