Lightweight human pose estimation based on attention mechanism

doi:10.11772/j.issn.1001-9081.2021061103

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (8): 2407-2414.DOI: 10.11772/j.issn.1001-9081.2021061103

• Artificial intelligence • Previous Articles Next Articles

Lightweight human pose estimation based on attention mechanism

Kun LI¹, Qing HOU¹^,²()

^1.College of Computer Science and Technology，Guizhou University，Guiyang Guizhou 550025，China
^2.Guizhou Communication Industry Service Company Limited，Guiyang Guizhou 550005，China

Received:2021-06-29 Revised:2021-09-21 Accepted:2021-09-28 Online:2022-08-09 Published:2022-08-10
Contact: Qing HOU
About author:LI Kun， born in 1997， M. S. candidate. His research interests include image processing， computer vision.
HOU Qing， born in 1980， Ph. D.， research fellow. His research interests include data mining， image processing.
Supported by:
National Innovation City “Hundred Cities and Hundred Gardens” Action Project （Architectural Science Project ［2020］ No. 22）, Guizhou Big Data Special Program(20200424)

基于注意力机制的轻量型人体姿态估计

李坤¹, 侯庆¹^,²()

^1.贵州大学计算机科学与技术学院，贵阳 550025
^2.贵州省通信产业服务有限公司，贵阳 550005

通讯作者: 侯庆
作者简介:李坤（1997—），男，山东潍坊人，硕士研究生，主要研究方向：图像处理、计算机视觉；
侯庆（1980—），男，天津人，研究员，博士，CCF会员，主要研究方向：数据挖掘、图像处理。
基金资助:
国家创新型城市“百城百园”行动项目(筑科项目［2020］22号);贵州省大数据专项计划项目(20200424)

Abstract

Abstract:

To solve the problems such as large number of parameters and high computational complexity of the high-resolution human pose estimation networks， a lightweight Sandglass Coordinate Attention Network （SCANet） based on High-Resolution Network （HRNet） was proposed for human pose estimation. The Sandglass module and the Coordinate Attention （CoordAttention） module were first introduced； then two lightweight modules， the Sandglass Coordinate Attention bottleneck （SCAneck） module and the Sandglass Coordinate Attention basicblock （SCAblock） module， were built on this basis to obtain the long-range dependence and accurate position information of the spatial direction of the feature map while reducing the amount of model parameters and computational complexity. Experimental results show that with the same image resolution and environmental configuration， SCANet model reduces the number of parameters by 52.6% and the computational complexity by 60.6% compared with HRNet model on Common Objects in COntext （COCO） validation set； the number of parameters and computational complexity of SCANet model are reduced by 52.6% and 61.1% respectively compared with those of HRNet model on Max Planck Institute for Informatics （MPII） validation set； compared with common human pose estimation networks such as Stacked Hourglass Network （Hourglass）， Cascaded Pyramid Network （CPN） and SimpleBaseline， SCANet model can still achieve high-precision prediction of key points of the human body with fewer parameters and lower computational complexity.

Key words: human pose estimation, deep neural network, High Resolution Network (HRNet), depthwise separable convolution, attention mechanism

摘要：

针对高分辨率人体姿态估计网络存在参数量大、运算复杂度高等问题，提出一种基于高分辨率网络（HRNet）的轻量型沙漏坐标注意力网络（SCANet）用于人体姿态估计。首先引入沙漏（Sandglass）模块和坐标注意力（CoordAttention）模块；然后在此基础上构建了沙漏坐标注意力瓶颈（SCAneck）模块和沙漏坐标注意力基础（SCAblock）模块两种轻量型模块，在降低模型参数量和运算复杂度的同时，获取特征图空间方向的长程依赖和精确位置信息。实验结果显示，在相同图像分辨率和环境配置的情况下，在COCO（Common Objects in COntext）校验集上，SCANet模型与HRNet模型相比参数量降低了52.6%，运算复杂度降低了60.6%；在MPII（Max Planck Institute for Informatics）校验集上，SCANet模型与HRNet模型相比参数量和运算复杂度分别降低了52.6%和61.1%；与常见的人体姿态估计网络如堆叠沙漏网络（Hourglass）、级联金字塔网络（CPN）和SimpleBaseline相比，SCANet模型在拥有更少的参数量与运算复杂度的情况下，仍能实现对人体关键点的高准确度预测。

关键词: 人体姿态估计, 深度神经网络, 高分辨率网络, 深度可分离卷积, 注意力机制

CLC Number:

TP183

Kun LI, Qing HOU. Lightweight human pose estimation based on attention mechanism[J]. Journal of Computer Applications, 2022, 42(8): 2407-2414.

李坤, 侯庆. 基于注意力机制的轻量型人体姿态估计[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2407-2414.

Figures/Tables 11

References 24

1	FISCHLER M A， ELSCHLAGER R A. The representation and matching of pictorial structures［J］. IEEE Transactions on Computers， 1973， C-22（1）： 67-92. 10.1109/t-c.1973.223602
2	KIEFEL M， GEHLER P V. Human pose estimation with fields of parts ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 331-346.
3	TOSHEV A， SZEGEDY C. DeepPose： human pose estimation via deep neural networks ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 1653-1660. 10.1109/cvpr.2014.214
4	SUN K， XIAO B， LIU D， et al. Deep high-resolution representation learning for human pose estimation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5686-5696. 10.1109/cvpr.2019.00584
5	NEWELL A， YANG K Y， DENG J. Stacked hourglass networks for human pose estimation ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9912. Cham： Springer， 2016： 483-499.
6	ZHOU D Q， HOU Q B， CHEN Y P， et al. Rethinking bottleneck structure for efficient mobile network design ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12348. Cham： Springer， 2020： 680-697.
7	HOU Q B， ZHOU D Q， FENG J S. Coordinate attention for efficient mobile network design ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13708-13717. 10.1109/cvpr46437.2021.01350
8	王丹峰，陈超波，马天力，等.基于深度可分离卷积的YOLOv3行人检测算法［J］.计算机应用与软件， 2020， 37（6）： 218-223. 10.3969/j.issn.1000-386x.2020.06.038
	WANG D F， CHEN C B， MA T L， et al. YOLOv3 pedestrian detection algorithm based on depth-wise separable convolution［J］. Computer Applications and Software， 2020， 37（6）： 218-223. 10.3969/j.issn.1000-386x.2020.06.038
9	董永昌，单玉刚，袁杰.基于改进SSD算法的行人检测方法［J］.计算机工程与设计， 2020， 41（10）： 2921-2926. 10.16208/j.issn1000-7024.2020.10.037
	DONG Y C， SHAN Y G， YUAN J. Pedestrain detection based on improved SSD［J］. Computer Engineering and Design， 2020， 41（10）： 2921-2926. 10.16208/j.issn1000-7024.2020.10.037
10	HE K M， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
11	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 91-99.
12	CAO Z， SIMON T， WEI S E， et al. Realtime multi-person 2D pose estimation using part affinity fields ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 1302-1310. 10.1109/cvpr.2017.143
13	WEI S E， RAMAKRISHNA V， KANADE T， et al. Convolutional pose machines ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4724-4732. 10.1109/cvpr.2016.511
14	SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2： inverted residuals and linear bottlenecks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520. 10.1109/cvpr.2018.00474
15	MA N N， ZHANG X Y， ZHENG H T， et al. ShuffleNet V2： practical guidelines for efficient CNN architecture design ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11218. Cham： Springer， 2018： 122-138.
16	HU J， SHEN L， SUN G. Squeeze-and-excitation networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
17	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
18	肖振久，杨晓迪，魏宪，等.改进的轻量型网络在图像识别上的应用［J］.计算机科学与探索， 2021， 15（4）： 743-753. 10.3778/j.issn.1673-9418.2004057
	XIAO Z J， YANG X D， WEI X， et al. Improved lightweight network in image recognition［J］. Journal of Frontiers of Computer Science and Technology， 2021， 15（4）： 743-753. 10.3778/j.issn.1673-9418.2004057
19	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks ［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.
20	ANDRILUKA M， PISHCHULIN L， GEHLER P， et al. 2D human pose estimation：new benchmark and state of the art analysis［C］//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 3686-3693. 10.1109/cvpr.2014.471
21	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2018 European Conference on Computer Vision. Cham： Springer， 2014： 740-755. 10.1007/978-3-319-10602-1_48
22	CHEN Y L， WANG Z C， PENG Y X， et al. Cascaded pyramid network for multi-person pose estimation ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7103-7112. 10.1109/cvpr.2018.00742
23	XIAO B， WU H P， WEI Y C. Simple baselines for human pose estimation and tracking ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11210. Cham： Springer， 2018： 472-487.
24	YU C Q， XIAO B， GAO C X， et al. Lite-HRNet： a lightweight high-resolution network ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10435-10445. 10.1109/cvpr46437.2021.01030

模型	基础框架	输入尺寸	参数量/10⁶	浮点运算量/GFLOPs	mAP/%	AP⁵⁰/%	AP⁷⁵/%	AP^M/%	AP^L/%	AR/%
Hourglass	Hourglass	256×192	25.1	14.3	66.9	—	—	—	—	—
CPN^［20］	ResNet-50	256×192	27.0	6.2	68.6	—	—	—	—	—
CPN+OHKM	ResNet-50	256×192	27.0	6.2	69.4	—	—	—	—	—
SimpleBaseline^［21］	ResNet-50	256×192	34.0	8.9	70.4	88.6	78.3	67.1	77.2	76.3
HRNet	HRNet	256×192	28.5	7.1	73.4	89.5	80.7	70.2	80.1	78.9
Lite-HRNet^［22］	Lite-HRNet-18	256×192	1.1	0.2	64.8	86.7	73.0	62.1	70.5	71.2
Lite-HRNet	Lite-HRNet-30	256×192	1.8	0.3	67.2	88.0	75.0	64.3	73.1	73.3
SCANet	HRNet	256×192	13.5	2.8	72.3	90.0	79.6	69.3	79.1	78.0

模型	基础框架	输入尺寸	参数量/10⁶	浮点运算量/GFLOPs	mAP/%	AP⁵⁰/%	AP⁷⁵/%	AP^M/%	AP^L/%	AR/%
Hourglass	Hourglass	256×192	25.1	14.3	66.9	—	—	—	—	—
CPN^［20］	ResNet-50	256×192	27.0	6.2	68.6	—	—	—	—	—
CPN+OHKM	ResNet-50	256×192	27.0	6.2	69.4	—	—	—	—	—
SimpleBaseline^［21］	ResNet-50	256×192	34.0	8.9	70.4	88.6	78.3	67.1	77.2	76.3
HRNet	HRNet	256×192	28.5	7.1	73.4	89.5	80.7	70.2	80.1	78.9
Lite-HRNet^［22］	Lite-HRNet-18	256×192	1.1	0.2	64.8	86.7	73.0	62.1	70.5	71.2
Lite-HRNet	Lite-HRNet-30	256×192	1.8	0.3	67.2	88.0	75.0	64.3	73.1	73.3
SCANet	HRNet	256×192	13.5	2.8	72.3	90.0	79.6	69.3	79.1	78.0

模型	基础框架	输入尺寸	参数量/10⁶	浮点运算量/GFLOPs	mAP/%	AP⁵⁰/%	AP⁷⁵/%	AP^M/%	AP^L/%	AR/%
HRNet	HRNet	384×288	28.5	16.0	74.9	92.5	82.8	71.3	80.9	80.1
SCANet	HRNet	384×288	13.5	6.2	72.8	92.6	80.7	69.8	79.9	79.0

模型	基础框架	输入尺寸	参数量/10⁶	浮点运算量/GFLOPs	mAP/%	AP⁵⁰/%	AP⁷⁵/%	AP^M/%	AP^L/%	AR/%
HRNet	HRNet	384×288	28.5	16.0	74.9	92.5	82.8	71.3	80.9	80.1
SCANet	HRNet	384×288	13.5	6.2	72.8	92.6	80.7	69.8	79.9	79.0

模型	参数量/10⁶	浮点运算量/GFLOPs	预测关键点的准确率/%
模型	参数量/10⁶	浮点运算量/GFLOPs	头部	肩部	肘部	手腕	臀部	膝盖	脚踝	平均
Hourglass	25.1	19.1	96.5	95.3	88.4	82.5	87.1	83.5	78.3	87.5
SimpleBaseline	68.6	20.9	96.7	95.4	88.6	82.9	87.5	83.8	79.0	87.9
HRNet	28.5	9.5	97.0	95.5	90.0	85.2	88.1	85.1	81.0	89.3
Lite-HRNet-18	1.1	0.3	—	—	—	—	—	—	—	86.1
Lite-HRNet-30	1.8	0.4	—	—	—	—	—	—	—	87.0
SCANet	13.5	3.7	97.2	95.4	89.9	83.7	88.9	84.6	79.8	88.7

Lightweight human pose estimation based on attention mechanism

基于注意力机制的轻量型人体姿态估计

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 24

Related Articles 15

Recommended Articles

Metrics

[1]	Minghui WU, Guangjie ZHANG, Canghong JIN. Time series prediction model based on multimodal information fusion [J]. Journal of Computer Applications, 2022, 42(8): 2326-2332.
[2]	Zhenhu LYU, Xinzheng XU, Fangyan ZHANG. Lightweight attention mechanism module based on squeeze and excitation [J]. Journal of Computer Applications, 2022, 42(8): 2353-2360.
[3]	Xiaoyu WANG, Zhanqing WANG, Wei XIONG. Deep asymmetric discrete cross-modal hashing method [J]. Journal of Computer Applications, 2022, 42(8): 2461-2470.
[4]	Liying ZHANG, Chunjiang PANG, Xinying WANG, Guoliang LI. Multi-scale object detection algorithm based on improved YOLOv3 [J]. Journal of Computer Applications, 2022, 42(8): 2423-2431.
[5]	Xinyu ZHANG, Sheng DING, Zhipei YANG. Traffic sign detection algorithm based on improved attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2378-2385.
[6]	Bo YANG, Hengwei ZHANG, Zheming LI, Kaiyong XU. Adversarial example generation method based on image flipping transform [J]. Journal of Computer Applications, 2022, 42(8): 2319-2325.
[7]	Yinglü XUAN, Yuan WAN, Jiahui CHEN. Time series classification by LSTM based on multi-scale convolution and attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2343-2352.
[8]	Chengxia XU, Qing YAN, Teng LI, Kaichao MIAO. De-raining algorithm based on joint attention mechanism for single image [J]. Journal of Computer Applications, 2022, 42(8): 2578-2585.
[9]	Feiyu YANG, Zhan SONG, Zhenzhong XIAO, Yaoyang MO, Yu CHEN, Zhe PAN, Min ZHANG, Yao ZHANG, Beibei QIAN, Chaowei TANG, Wu JIN. Rethinking errors in human pose estimation heatmap [J]. Journal of Computer Applications, 2022, 42(8): 2548-2555.
[10]	Cheng HUANG, Qianrui ZHAO. Sensitive information detection method based on attention mechanism-based ELMo [J]. Journal of Computer Applications, 2022, 42(7): 2009-2014.
[11]	Xiaohan LI, Jun WANG, Huading JIA, Liu XIAO. Stock market volatility prediction method based on graph neural network with multi-attention mechanism [J]. Journal of Computer Applications, 2022, 42(7): 2265-2273.
[12]	Rongyuan CHEN, Jianmin YAO, Qun YAN, Zhixian LIN. Video playback speed recognition based on deep neural network [J]. Journal of Computer Applications, 2022, 42(7): 2043-2051.
[13]	Yayao ZUO, Haoyu CHEN, Zhiran CHEN, Jiawei HONG, Kun CHEN. Named entity recognition method combining multiple semantic features [J]. Journal of Computer Applications, 2022, 42(7): 2001-2008.
[14]	Wanjun LIU, Jiaming WANG, Haicheng QU, Libing DONG, Xinyu CAO. Music genre classification algorithm based on attention spectral-spatial feature [J]. Journal of Computer Applications, 2022, 42(7): 2072-2077.
[15]	Bo LIU, Linbo QING, Zhengyong WANG, Mei LIU, Xue JIANG. Group activity recognition based on partitioned attention mechanism and interactive position relationship [J]. Journal of Computer Applications, 2022, 42(7): 2052-2057.