Crowd counting algorithm with multi-scale fusion based on normal inverse Gamma distribution

doi:10.11772/j.issn.1001-9081.2023060782

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2243-2249.DOI: 10.11772/j.issn.1001-9081.2023060782

• Multimedia computing and computer simulation • Previous Articles Next Articles

Crowd counting algorithm with multi-scale fusion based on normal inverse Gamma distribution

Wei LI¹(), Xiaorong ZHANG¹, Peng CHEN¹, Qing LI², Changqing ZHANG²

^1.The 28th Research Institute，China Electronics Technology Group Corporation，Nanjing Jiangsu 210007，China
^2.College of Intelligence and Computing，Tianjin University，Tianjin 300354，China

Received:2023-06-27 Revised:2023-08-24 Accepted:2023-08-25 Online:2023-09-04 Published:2024-07-10
Contact: Wei LI
About author:ZHANG Xiaorong， born in 1989， M. S.， engineer. Her research interests include argumentation on the field of armed police command and control.
CHEN Peng， born in 1978， senior engineer. His research interests include development planning and overall argumentation in the field of security （such as armed police）.
LI Qing， born in 1995， Ph. D. candidate. His research interests include computer vision.
ZHANG Changqing， born in 1982， Ph. D.， associate professor. His research interests include machine learning， computer vision.
First author contact:LI Wei， born in 1990， M. S.， engineer. His research interests include application in the field of armed police command and control.
Supported by:
National Natural Science Foundation of China(61976151)

基于正态逆伽马分布的多尺度融合人群计数算法

李伟¹(), 张晓蓉¹, 陈鹏¹, 李清², 张长青²

^1.中国电子科技集团公司第二十八研究所, 南京 210007
^2.天津大学智能与计算学部, 天津 300354

通讯作者: 李伟
作者简介:张晓蓉（1989—），女，山西太原人，工程师，硕士，主要研究方向：武警指挥和控制领域论证；
陈鹏（1978—），男，山东蒙阴人，正高级工程师，主要研究方向：安全（含武警）领域发展规划和总体论证；
李清（1995—），男，天津人，博士研究生，主要研究方向：计算机视觉；
张长青（1982—），男，河南安阳人，副教授，博士，CCF会员，主要研究方向：机器学习、计算机视觉。
第一联系人：李伟（1990—），男，吉林吉林人，工程师，硕士，主要研究方向：武警指挥和控制领域应用；
基金资助:
国家自然科学基金资助项目(61976151)

Abstract

Abstract:

To solve the problem of large variation caused by different distances between monitoring camera and crowd in the crowd analysis tasks， a crowd counting algorithm with multi-scale fusion based on normal inverse Gamma distribution was proposed， named MSF （Multi-Scale Fusion crowd counting） algorithm. Firstly， the common features were extracted with the traditional backbone， and then the pedestrian information of different scales was obtained with the multi-scale information extraction module. Secondly， a crowd density estimation module and an uncertainty estimation module for evaluating the reliability of the prediction results of each scale were contained in each scale network. Finally， more accurate density regression results were obtained by dynamically fusing the multi-scale prediction results based on the reliability in the multi-scale prediction fusion module. The experimental results show that after the expansion of the existing method Converged Scene Recognition Network （CSRNet） by multi-scale trusted fusion， the Mean Absolute Error （MAE） and Mean Squared Error （MSE） of crowd counting on UCF-QNRF dataset are significantly decreased by 4.43% and 1.37%， respectively， which verifies the rationality and effectiveness of MSF algorithm. In addition， different from the existing methods， the MSF algorithm can not only predict the crowd density， but also provide the reliability of the prediction during the deployment stage， so that the inaccurate areas predicted by the algorithm can be timely warned in practical applications， reducing the wrong prediction risks in subsequent analysis tasks.

Key words: crowd counting, multi-scale, trustworthy fusion, crowd density estimation, uncertainty

摘要：

针对人群分析任务中往往存在的因监控与人群距离不同而导致的尺度变化大的问题，提出一种基于正态逆伽马分布的多尺度融合人群计数算法MSF（Multi-Scale Fusion crowd counting）。首先，使用传统骨架提取公共特征，通过多尺度信息提取模块获得图像中不同尺度的行人信息；其次，每个尺度的网络各自包含一个人群密度估计模块和一个用于评估每个尺度预测结果可信度的不确定估计模块；最后，多尺度预测融合模块依据可信度对多尺度预测结果进行动态融合，以获得更准确的密度回归结果。实验结果表明，现有算法密集场景识别网络（CSRNet）在通过多尺度可信融合扩展后，在UCF-QNRF数据集上人群计数的平均绝对误差（MAE）和均方误差（MSE）分别减小了4.43%和1.37%，验证了MSF算法的合理性和有效性。此外，与现有算法不同，MSF算法不仅可以预测人群密度，还可以在部署阶段提供预测的可信程度，从而使算法在实际应用中能及时预警模型预测不准确的区域，降低后续分析任务出现错误预判的风险。

关键词: 人群计数, 多尺度, 可信融合, 人群密度估计, 不确定性

CLC Number:

TP391.4

Wei LI, Xiaorong ZHANG, Peng CHEN, Qing LI, Changqing ZHANG. Crowd counting algorithm with multi-scale fusion based on normal inverse Gamma distribution[J]. Journal of Computer Applications, 2024, 44(7): 2243-2249.

李伟, 张晓蓉, 陈鹏, 李清, 张长青. 基于正态逆伽马分布的多尺度融合人群计数算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2243-2249.

Figures/Tables 9

References 33

1	姬丽娜，陈庆奎，陈圆金，等. 基于GPU的视频流人群实时计数［J］. 计算机应用， 2017， 37（1）： 145-152.
	JI L N， CHEN Q K， CHEN Y J， et al. Real-time crowd counting method from video stream based on GPU ［J］. Journal of Computer Applications， 2017， 37（1）： 145-152.
2	YAO H， CAVALLARO A， BOUWMANS T， et al. Guest editorial introduction to the special issue on group and crowd behavior analysis for intelligent multicamera video surveillance［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2017， 27（3）： 405-408.
3	付倩慧，李庆奎，傅景楠，等. 基于空间维度循环感知网络的密集人群计数模型［J］. 计算机应用， 2021， 41（2）： 544-549.
	FU Q H， LI Q K， FU J N， et al. Dense crowd counting model based on spatial dimensional recurrent perception network［J］. Journal of Computer Applications， 2021， 41（2）： 544-549.
4	时增林，叶阳东，吴云鹏，等. 基于序的空间金字塔池化网络的人群计数方法［J］. 自动化学报， 2016， 42（6）： 866-874.
	SHI Z L， YE Y D， WU Y P， et al. Crowd counting using rank-based spatial pyramid pooling network［J］. Acta Automatica Sinica， 2016， 42（6）： 866-874.
5	ZENG C， MA H. Robust head-shoulder detection by PCA-based multilevel HOG-LBP detector for people counting ［C］// Proceedings of the 2010 20th International Conference on Pattern Recognition. Piscataway： IEEE， 2010： 2069-2072.
6	CHAN A B， VASCONCELOS N. Bayesian poisson regression for crowd counting ［C］// Proceedings of the 2019 IEEE 12th International Conference on Computer Vision. Piscataway： IEEE， 2009： 545-551.
7	CHEN J， WANG Z. Crowd counting with segmentation attention convolutional neural network［J］. IET Image Processing， 2021， 15（6）： 1221-1231.
8	陈美云，王必胜，曹国，等. 基于像素级注意力机制的人群计数方法［J］. 计算机应用， 2020， 40（1）： 56-61.
	CHEN M Y， WANG B S， CAO G， et al. Crowd counting method based on pixel-level attention mechanism［J］. Journal of Computer Applications， 2020， 40（1）： 56-61.
9	ZHAO H， SHI J， QI X， et al. Pyramid scene parsing network ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2881-2890.
10	HE K， ZHANG X， REN S， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916.
11	LIN T-Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944.
12	RONNEBERGER O， FISCHER P， BROX T. U-Net： convolutional networks for biomedical image segmentation ［C］// Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention. Cham： Springer， 2015： 234-241.
13	ZHANG Y， ZHOU D， CHEN S， et al. Single-image crowd counting via multi-column convolutional neural network ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 589-597.
14	LI Y， ZHANG X， CHEN D. CDRNet： dilated convolutional neural networks for understanding the highly congested scenes ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1091-1100.
15	AMINI A， SCHWARTING W， SOLEIMANY A， et al. Deep evidential regression ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates， 2020： 14927-14937.
16	MA H， HAN Z， ZHANG C， et al. Trustworthy multimodal regression with mixture of normal-inverse Gamma distributions ［C］// Proceedings of the 35th International Conference on Neural Information Processing System. Red Hook： Curran Associates， 2021： 6881-6893.
17	CHAN A B， LIANG Z-S J， VASCONCELOS N. Privacy preserving crowd monitoring： counting people without people models or tracking ［C］// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2008： 1-7.
18	OÑNORO-RUBIO D， LÓPEZ-SASTRE R J. Towards perspective-free object counting with deep learning ［C］// Proceedings of the 14th European Conference on Computer Vision. Berlin： Springer， 2016： 615-629.
19	SAM D B， SURYA S， BABU R V. Switching convolutional neural network for crowd counting ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 4031-4039.
20	HAFNER D， TRAN D， LILLICRAP T， et al. Noise contrastive priors for functional uncertainty［C/OL］// Proceedings of the 2019 Conference on Uncertainty in Artificial Intelligence. ［2023-05-30］. .
21	MOLCHANOV D， ASHUKHA A， VETROV D. Variational dropout sparsifies deep neural networks ［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 2498-2507.
22	GAL Y， GHAHRAMANI Z. Dropout as a Bayesian approximation： representing model uncertainty in deep learning ［C］// Proceedings of the 33rd International Conference on Machine Learning. New York： JMLR.org， 2016： 1050-1059.
23	MUKHOTI J， GAL Y. Evaluating Bayesian deep learning methods for semantic segmentation ［EB/OL］. （2019-03-23）［2023-07-20］. .
24	LAKSHMINARAYANAN B， PRITZEL A， BLUNDELL C. Simple and scalable predictive uncertainty estimation using deep ensembles ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates， 2017： 6405-6416.
25	ANTORÁN J， ALLINGHAM J U， HERNÁNDEZ-LOBATO J M. Depth uncertainty in neural networks ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates， 2020： 10620-10634.
26	YU F， KOLTUN V. Multi-scale context aggregation by dilated convolutions［C/OL］// Proceedings of the 2016 International Conference on Learning Representations. ［2023-05-30］. .
27	KENDALL A， GAL Y. What uncertainties do we need in Bayesian deep learning for computer vision？［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates， 2017： 5580-5590.
28	IDREES H， SALEEMI I， SEIBERT C， et al. Multi-source multi-scale counting in extremely dense crowd images ［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2013： 2547-2554.
29	IDREES H， TAYYAB M， ATHREY K， et al. Composition loss for counting， density map estimation and localization in dense crowds ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 544-559.
30	ZHANG C， LI H， WANG X， et al. Cross-scene crowd counting via deep convolutional neural networks ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 833-841.
31	SINDAGI V A， PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting ［C］// Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway： IEEE， 2017： 1-6.
32	YI J， SHEN Z， CHEN F， et al. A lightweight multiscale feature fusion network for remote sensing object counting［J］. IEEE Transactions on Geoscience and Remote Sensing， 2023， 61： 5902113.
33	WANG Q， GAO J， LIN W， et al. Learning from synthetic data for crowd counting in the wild ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 8198-8207.

模型	模块	结构
Base Model+MSF	多尺度信息提取模块（膨胀率分别为1和2）	Conv（512，256，3，ReLU）
	多尺度信息提取模块（膨胀率分别为1和2）	Conv（256，128，3，ReLU）
	人群密度估计模块	Conv（128，1，1，None）
	不确定性估计模块	Conv（128，3，1，None）
CSRNet+MSF	多尺度信息提取模块（膨胀率分别为1和2）	CSRNet的back-end （6层卷积操作）
	人群密度估计模块	Conv（64，1，1，None）
	不确定性估计模块	Conv（64，3，1，None）

模型	模块	结构
Base Model+MSF	多尺度信息提取模块（膨胀率分别为1和2）	Conv（512，256，3，ReLU）
	多尺度信息提取模块（膨胀率分别为1和2）	Conv（256，128，3，ReLU）
	人群密度估计模块	Conv（128，1，1，None）
	不确定性估计模块	Conv（128，3，1，None）
CSRNet+MSF	多尺度信息提取模块（膨胀率分别为1和2）	CSRNet的back-end （6层卷积操作）
	人群密度估计模块	Conv（64，1，1，None）
	不确定性估计模块	Conv（64，3，1，None）

算法	ShanghaiTech part A		ShanghaiTech part B		UCF-QNRF		UCF_CC_50
算法	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE
Crowd-CNN^［30］	181.8	277.7	32.0	49.8	—	—	467	498.5
MCNN^［13］	110.2	173.2	26.4	41.3	277.0	426.0	377.6	509.1
CMTL^［31］	101.3	152.4	20.0	31.1	252.0	514.0	322.8	341.4
Switch-CNN^［19］	90.4	135.0	21.6	33.4	228.0	445.0	318.1	439.2
LMSFFNet^［32］	85.9	139.9	9.2	15.1	112.8	201.6	105.7	120.3
Base model^［33］	71.4	115.7	10.3	16.5	119.3	207.7	290.0	406.4
Base model^［33］+MSF	67.1	110.3	8.3	12.7	108.5	185.5	274.3	363.9
CSRNet^［14］	68.2	115.0	10.6	16.0	110.6	190.1	266.1	397.5
CSRNet^［14］+MSF	67.2	110.0	8.7	13.2	105.7	187.5	266.6	346.9

算法	ShanghaiTech part A		ShanghaiTech part B		UCF-QNRF		UCF_CC_50
算法	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE
Crowd-CNN^［30］	181.8	277.7	32.0	49.8	—	—	467	498.5
MCNN^［13］	110.2	173.2	26.4	41.3	277.0	426.0	377.6	509.1
CMTL^［31］	101.3	152.4	20.0	31.1	252.0	514.0	322.8	341.4
Switch-CNN^［19］	90.4	135.0	21.6	33.4	228.0	445.0	318.1	439.2
LMSFFNet^［32］	85.9	139.9	9.2	15.1	112.8	201.6	105.7	120.3
Base model^［33］	71.4	115.7	10.3	16.5	119.3	207.7	290.0	406.4
Base model^［33］+MSF	67.1	110.3	8.3	12.7	108.5	185.5	274.3	363.9
CSRNet^［14］	68.2	115.0	10.6	16.0	110.6	190.1	266.1	397.5
CSRNet^［14］+MSF	67.2	110.0	8.7	13.2	105.7	187.5	266.6	346.9

算法	Part A		Part B
算法	MAE	MSE	MAE	MSE
尺度1	68.3	114.4	9.7	15.1
尺度2	67.8	112.3	9.5	15.7
平均融合	67.9	112.5	9.2	14.1
MSF	67.2	110.0	8.7	13.2

Crowd counting algorithm with multi-scale fusion based on normal inverse Gamma distribution

基于正态逆伽马分布的多尺度融合人群计数算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 33

Related Articles 15

Recommended Articles

Metrics

[1]	Yan RONG, Jiawen LIU, Xinlei LI. Adaptive hybrid network for affective computing in student classroom [J]. Journal of Computer Applications, 2024, 44(9): 2919-2930.
[2]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[3]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[4]	Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413.
[5]	Chenqian LI, Jun LIU. Ultrasound carotid plaque segmentation method based on semi-supervision and multi-scale cascaded attention [J]. Journal of Computer Applications, 2024, 44(8): 2604-2610.
[6]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[7]	Yuan TANG, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Relation extraction model based on multi-scale hybrid attention convolutional neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2011-2017.
[8]	Sailong SHI, Zhiwen FANG. Gaze estimation model based on multi-scale aggregation and shared attention [J]. Journal of Computer Applications, 2024, 44(7): 2047-2054.
[9]	Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242.
[10]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[11]	Yan ZHOU, Yang LI. Rectified cross pseudo supervision method with attention mechanism for stroke lesion segmentation [J]. Journal of Computer Applications, 2024, 44(6): 1942-1948.
[12]	Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI. Multi-relation approximate reasoning model based on uncertain knowledge graph embedding [J]. Journal of Computer Applications, 2024, 44(6): 1751-1759.
[13]	Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG. Time series classification method based on multi-scale cross-attention fusion in time-frequency domain [J]. Journal of Computer Applications, 2024, 44(6): 1842-1847.
[14]	Xiaohui CHENG, Yuntian HUANG, Ruifang ZHANG. Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention [J]. Journal of Computer Applications, 2024, 44(6): 1927-1934.
[15]	Hongtian LI, Xinhao SHI, Weiguo PAN, Cheng XU, Bingxin XU, Jiazheng YUAN. Few-shot object detection via fusing multi-scale and attention mechanism [J]. Journal of Computer Applications, 2024, 44(5): 1437-1444.