Lightweight human pose estimation network based on redundant feature suppression

doi:10.11772/j.issn.1001-9081.2025060700

Abstract

Abstract:

A lightweight Human Pose Estimation （HPE） network based on redundant feature suppression was proposed to address the difficulty of balancing computational efficiency and localization accuracy of the existing HPE networks in complex scenarios. It was named LE-SHNet （Lightweight Enhanced Stacked Hourglass Network）. Firstly， the Multiple Separated Hourglass Module （MSHM） was designed to employ heterogeneous convolution branches for differential modeling of the features of large joints and distal limbs， while suppressing redundant computations. Then， the Shuffle Efficient Channel Attention （SECA） was integrated between MSHMs， so as to combine channel shuffling and adaptive kernel convolution to enhance hierarchical joint correlations with zero additional parameters. Finally， the Spatial and Channel Perception Module （SCPM） was constructed in non-MSHMs to strengthen perception ability of key areas by spatial-channel reconstruction and Triplet Attention （TA） mechanism. Experimental results show that LE-SHNet achieves Average Precision （AP） of 88.7% on MPII （Max Planck Institute for Informatics） and 71.3% on COCO2017 （Common Objects in COntext 2017）， while reduces the number of parameters by 49.3%， reduces the computational cost by 28.2%， and increases the Average Precision （AP） by 1.0 percentage points compared with the baseline network — Two Stacked Hourglass Network （2-SHNet）； compared with the lightweight HPE networks EL-HRNet （Efficient and Lightweight High-Resolution Network） and MobileMultiPose （Mobile-friendly and Multi-feature aggregation Pose estimation）， LE-SHNet achieves AP improvements of 1.0 and 0.8 percentage points， respectively， while reducing the number of parameters by 32.0% and 26.7%， respectively. It can be seen that LE-SHNet maintains lightweight properties while improving keypoint localization accuracy， so that it has potential application values for real-time deployment on edge devices in scenarios such as intelligent monitoring， human-computer interaction， and sports rehabilitation.

Key words: Human Pose Estimation (HPE), Stacked Hourglass Network (SHNet), spatial-channel reconstruction, Triplet Attention (TA), redundant feature suppression, multi-scale feature fusion

摘要：

针对现有人体姿态估计（HPE）网络在复杂场景下难以兼顾计算效率与定位精度的问题，提出一种基于冗余特征抑制的轻量级HPE网络，命名为LE-SHNet （Lightweight Enhanced Stacked Hourglass Network）。首先，设计多重分离沙漏模块（MSHM），通过异构卷积分支差异化建模大关节与末端肢体特征，并有效抑制冗余计算；其次，在MSHM 之间引入混洗高效通道注意力（SECA），融合通道混洗与自适应核卷积，以零参数量强化跨层级关节点关联；最后，在非MSHM中构建空间通道感知模块（SCPM），利用空间通道重构与三重注意力（TA）机制增强关键区域的感知能力。实验结果表明，LE-SHNet在MPII （Max Planck Institute for Informatics）和COCO2017 （Common Objects in COntext 2017）数据集上平均精确度（AP）分别达到88.7%和71.3%，相较于基线网络——二叠沙漏网络（2-SHNet）在参数量上减少了49.3%，计算量降低了28.2%，平均精确率（AP）提升了1.0个百分点；相较于轻量级HPE网络EL-HRNet （Efficient and Lightweight High-Resolution Network）和MobileMultiPose （Mobile-friendly and Multi-feature aggregation Pose estimation），LE-SHNet的AP分别提升了1.0和0.8个百分点，同时参数量分别减少了32.0%和26.7%。可见，LE-SHNet在保持轻量化的同时提升了关键点的定位精度，具有在边缘设备实时部署中的潜在应用价值，可广泛用于智能监控、人机交互及运动康复等场景。

关键词: 人体姿态估计, 堆叠沙漏网络, 空间通道重构, 三重注意力, 冗余特征抑制, 多尺度特征融合

CLC Number:

TP391.41

Chao LYU, Geyao MA. Lightweight human pose estimation network based on redundant feature suppression[J]. Journal of Computer Applications, 2026, 46(6): 1973-1980.

吕超, 马歌谣. 基于冗余特征抑制的轻量级人体姿态估计网络[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1973-1980.

Figures/Tables 12

Fig.1 Overall structure of LE-SHNet

Fig. 2 Structure of bottleneck residual module based on SPConv

Fig. 3 Structure of MSHM

Fig.4 Structure of SECA module

Fig. 5 Structure of SCPM

Tab. 1 Time complexity comparison

改进模块	原始网络中的结构	原始网络的时间复杂度表达式	改进网络的时间复杂度表达式
MSHM	标准瓶颈残差模块组成的沙漏模块	$O (9 C 2 × H × W)$	$O (α × (9 / g + 1) + 1 - α) × C × H × W$
SECA	无注意力	0	$O (k × C + (C / G) × H × W × k 2)$
SCPM	标准瓶颈残差模块	$O (9 C 2 × H × W)$	$O (9 / 5) C 2 × H × W + 294 × H × W$

Tab. 1 Time complexity comparison

改进模块	原始网络中的结构	原始网络的时间复杂度表达式	改进网络的时间复杂度表达式
MSHM	标准瓶颈残差模块组成的沙漏模块	$O (9 C 2 × H × W)$	$O (α × (9 / g + 1) + 1 - α) × C × H × W$
SECA	无注意力	0	$O (k × C + (C / G) × H × W × k 2)$
SCPM	标准瓶颈残差模块	$O (9 C 2 × H × W)$	$O (9 / 5) C 2 × H × W + 294 × H × W$

Tab. 2 Ablation study results

MSHM	SECA	SCPM	参数量/10⁶	FLOPs/10⁹	不同预测关键点的PCKh@0.5/%
MSHM	SECA	SCPM	参数量/10⁶	FLOPs/10⁹	头部	肩部	肘部	手腕	臀部	膝盖	脚踝	平均
×	×	×	6.7	2.52	96.2	94.6	87.8	81.5	87.9	82.8	78.0	87.7
√	×	×	4.2	1.94	96.4	94.8	88.1	81.6	88.2	82.9	78.1	87.8
×	√	×	6.7	2.63	96.5	95.1	88.6	83.0	87.9	83.4	79.1	88.2
×	×	√	5.9	2.31	96.6	94.8	88.2	81.7	88.4	82.9	78.2	87.9
√	×	√	3.4	1.77	96.4	94.6	87.9	81.6	88.1	82.8	78.0	88.1
×	√	√	5.3	2.17	96.7	95.2	88.6	82.9	88.4	83.6	79.3	88.4
√	√	×	4.2	2.01	96.5	95.2	88.6	82.8	88.2	83.5	79.2	88.3
√	√	√	3.4	1.81	96.8	95.3	88.8	83.2	88.5	84.1	80.2	88.7

Tab. 3 Comparison experimental results on MPII validation set

网络类型	网络	参数量/10⁶	FLOPs/10⁹	不同预测关键点的PCKh@0.5/%
网络类型	网络	参数量/10⁶	FLOPs/10⁹	头部	肩部	肘部	手腕	臀部	膝盖	脚踝	平均
大规模网络	2-SHNet^［2］	6.70	2.52	96.2	94.6	87.8	81.5	87.9	82.8	78.0	87.7
	FLPN^［23］	22.50	2.80	96.2	95.2	88.6	82.7	88.4	83.6	80.0	88.4
	HRNet-MSSA^［24］	28.50	10.30	—	—	—	—	—	—	—	91.5
	MamKPD-B^［25］	7.10	3.10	—	—	—	—	—	—	—	90.7
轻量级网络	Lightweight^［26］	3.10	0.77	95.6	93.9	85.1	79.5	86.3	80.4	75.5	85.9
	EL-HRNet-32^［27］	5.00	2.66	96.7	94.8	87.6	82.2	88.2	82.4	77.9	87.7
	WideHRNet-18^［28］	2.70	0.96	—	—	—	—	—	—	—	87.7
	LMFormer-L^［29］	4.10	1.90	—	—	—	—	—	—	—	87.6
	HRNet-MSSA-Lite ^［24］	1.10	0.70	—	—	—	—	—	—	—	83.7
	MobileMultiPose-L ^［16］	4.64	1.61	—	—	—	—	—	—	—	87.9
	LE-SHNet	3.40	1.81	96.8	95.3	88.8	83.2	88.5	84.1	80.2	88.7

Tab. 4 Comparison experimental results on COCO2017 validation set

数据集	网络类型	网络名称	输入尺寸	参数量/10⁶	FLOPs/10⁹	AP/%	AP⁵⁰/%	AP⁷⁵/%	AR/%
COCO2017 验证集	大规模网络	2-SHNet^［2］	256×192	6.70	2.10	65.6	87.3	73.8	72.0
		HRPVT-L^［30］	256×192	25.10	5.40	75.2	90.6	82.4	80.4
		MSPose-L^［31］	256×192	27.50	11.00	76.0	90.5	82.7	81.2
		MamKPD-L^［25］	256×192	12.40	4.30	77.3	90.8	83.4	82.1
	轻量级网络	Lightweight^［26］	256×192	3.10	0.58	65.8	87.7	74.1	72.1
		EL-HRNet-32 ^［27］	256×192	5.00	2.00	67.1	86.4	74.2	74.9
		HRPVT-S^［30］	256×192	4.80	1.10	69.7	88.4	77.6	75.1
		LMFormer-L ^［29］	256×192	4.10	1.40	68.9	88.3	76.4	74.7
		MamKPD-S^［25］	256×192	6.30	0.50	75.2	90.4	82.2	75.3
		MSPose-T^［31］	256×192	5.80	1.30	67.1	87.3	75.3	73.4
		MobileMultiPose-L^［16］	256×192	4.64	1.17	70.4	89.0	78.5	76.3
		LE-SHNet	256×192	3.40	1.42	71.3	89.0	78.2	77.1
COCO2017 测试-开发集	大规模网络	2-SHNet^［2］	256×192	6.70	2.10	65.1	89.5	73.2	71.0
		SimpleBaseline ^［32］	256×192	34.00	8.90	70.0	90.9	77.9	75.6
		MobileNetV2 ^［33］	256×192	9.60	1.48	64.1	89.4	71.8	70.1
		ShuffleNet V2 ^［34］	256×192	7.60	1.30	59.5	87.4	66.0	66.0
	轻量级网络	Lite-HRNet^［35］	256×192	1.10	0.20	63.7	88.6	71.1	69.7
		Lightweight^［26］	256×192	3.10	0.58	65.3	89.7	73.4	71.3
		EL-HRNet^［27］	256×192	5.00	2.00	67.7	89.7	75.5	74.4
		LE-SHNet	256×192	3.40	1.42	70.7	90.8	78.5	76.5

Fig. 6 Visualization of pose estimation results of LE-SHNet on COCO2017 dataset

Tab.5 Comparison experimental results of inference speed

网络	输入尺寸	AP/%	边缘设备上的推理时间/ms	CPU设备上的推理时间/ms
2-SHNet^［2］	256×192	65.6	24.26	15.08
RSN-18^［36］	256×192	70.4	21.24	11.99
SimCC^［37］	256×192	68.6	22.75	26.69
RTMPose-S^［38］	256×192	68.5	16.65	8.63
EdgeNet-S^［39］	256×192	69.5	19.26	12.63
LE-SHNet	256×192	71.3	15.76	6.87

Fig. 7 Comparison of inference results between LE-SHNet and 2-SHNet （MPII dataset）

References 39

[1]	陈俊颖，郭士杰，陈玲玲. 基于解耦注意力与幻影卷积的轻量级人体姿态估计［J］. 计算机应用， 2025， 45（1）： 223-233.
	CHEN J Y， GUO S J， CHEN L L. Lightweight human pose estimation based on decoupled attention and ghost convolution［J］. Journal of Computer Applications， 2025， 45（1）： 223-233.
[2]	NEWELL A， YANG K， DENG J. Stacked hourglass networks for human pose estimation［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9912. Cham： Springer， 2016： 483-499.
[3]	KIM S T， LEE H J. Lightweight stacked hourglass network for human pose estimation［J］. Applied Sciences， 2020， 10（18）： 6497.
[4]	CHOLLET F. Xception： deep learning with depthwise separable convolutions［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 1800-1807.
[5]	ZHANG Q， JIANG Z， LU Q， et al. Split to be slim： an overlooked redundancy in vanilla convolution［C］// Proceedings of the 29th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2020： 3195-3201.
[6]	LI J， WEN Y， HE L. SCConv： spatial and channel reconstruction convolution for feature redundancy［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 6153-6162.
[7]	MISRA D， NALAMADA T， ARASANIPALAI A U， et al. Rotate to attend： convolutional triplet attention module［C］// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021： 3138-3147.
[8]	WANG Q， WU B， ZHU P， et al. ECA-Net： efficient channel attention for deep convolutional neural networks［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11531-11539.
[9]	ZHANG Q L， YANG Y B. SA-Net： shuffle attention for deep convolutional neural networks［C］// Proceedings of the 2021 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2021： 2235-2239.
[10]	ANDRILUKA M， PISHCHULIN L， GEHLER P， et al. 2D human pose estimation： new benchmark and state of the art analysis［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 3686-3693.
[11]	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
[12]	ANDRILUKA M， ROTH S， SCHIELE B. Monocular 3D pose estimation and tracking by detection［C］// Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2010： 623-630.
[13]	FISCHLER M A， ELSCHLAGER R A. The representation and matching of pictorial structures［J］. IEEE Transactions on Computers， 1973， C-22（1）： 67-92.
[14]	FELZENSZWALB P F， HUTTENLOCHER D P. Pictorial structures for object recognition［J］. International Journal of Computer Vision， 2005， 61（1）： 55-79.
[15]	ESMAIL M A， WANG J， WANG Y， et al. Resource-aware strategies for real-time multi-person pose estimation［J］. Image and Vision Computing， 2025， 155： No.105441.
[16]	LI B， TANG S， LI W. Mobile-friendly and multi-feature aggregation via Transformer for human pose estimation［J］. Image and Vision Computing， 2025， 153： No.105343.
[17]	DAI Q， LING Q. Hybrid representation learning for end-to-end multi-person pose estimation［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2025， 35（7）： 6437-6451.
[18]	LV C， MA G. PoseNet++： a multi-scale and optimized feature extraction network for high-precision human pose estimation［J］. PLoS ONE， 2025， 20（6）： No.e0326232.
[19]	HUA G， LI L， LIU S. Multipath affinage stacked-hourglass networks for human pose estimation［J］. Frontiers of Computer Science， 2020， 14（4）： No.144701.
[20]	XIAO Y， YU D， WANG X， et al. SPCNet： spatial preserve and content-aware network for human pose estimation［C］// Proceedings of the 24th European Conference on Artificial Intelligence. Amsterdam： IOS Press， 2020： 2776-2783.
[21]	BAO W， YANG Y， LIANG D， et al. Multi-residual module stacked hourglass networks for human pose estimation［J］. Journal of Beijing Institute of Technology， 2020， 29（1）： 110-119.
[22]	ZOU X， BI X， YU C. Improving human pose estimation based on stacked hourglass network［J］. Neural Processing Letters， 2023， 55（7）： 9521-9544.
[23]	REN H， WANG W， ZHANG K， et al. Fast and lightweight human pose estimation［J］. IEEE Access， 2021， 9： 49576-49589.
[24]	ZHANG T， LI Q， WEN J， et al. Enhancement and optimisation of human pose estimation with multi-scale spatial attention and adversarial data augmentation［J］. Information Fusion， 2024， 111： No.102522.
[25]	DANG Y， LIU L， KANG H， et al. MamKPD： a simple mamba baseline for real-time 2D keypoint detection［EB/OL］. ［2025-06-23］..
[26]	LI S， XIANG X. Lightweight human pose estimation using heatmap-weighting loss［EB/OL］. ［2025-06-23］..
[27]	LI R， YAN A， YANG S， et al. Human pose estimation based on Efficient and Lightweight High-Resolution Network （EL-HRNet）［J］. Sensors， 2024， 24（2）： No.396.
[28]	SAMKARI E， ARIF M， AlGHAMDI M， et al. WideHRNet： an efficient model for human pose estimation using wide channels in lightweight high-resolution network［J］. IEEE Access， 2024， 12： 148990-149000.
[29]	LI B， TANG S， LI W. LMFormer： lightweight and multi-feature perspective via Transformer for human pose estimation［J］. Neurocomputing， 2024， 594： No.127884.
[30]	XU Z， DAI M， ZHANG Q， et al. HRPVT： high-resolution pyramid vision Transformer for medium and small-scale human pose estimation［J］. Neurocomputing， 2025， 619： No.129154.
[31]	YUAN X， CHENG P， HAN S. Multi-supervision Transformer combining bounding box and mask for data-limited pose estimation［J］. Neurocomputing， 2024， 571： No.127209.
[32]	XIAO B， WU H， WEI Y. Simple baselines for human pose estimation and tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11210. Cham： Springer， 2018： 472-487.
[33]	SANDLER M， HOWARD A， ZHU M， et al. MobileNetV2： inverted residuals and linear bottlenecks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520.
[34]	MA N， ZHANG X， ZHENG H T， et al. ShuffleNet V2： practical guidelines for efficient CNN architecture design［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11218. Cham： Springer， 2018： 122-138.
[35]	YU C， XIAO B， GAO C， et al. Lite-HRNet： a lightweight high-resolution network［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10435-10445.
[36]	CAI Y， WANG Z， LUO Z， et al. Learning delicate local representations for multi-person pose estimation［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12348. Cham： Springer， 2020： 455-472.
[37]	LI Y， YANG S， LIU P， et al. SimCC： a simple coordinate classification perspective for human pose estimation［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13666. Cham： Springer， 2022： 89-106.
[38]	JIANG T， LU P， ZHANG L， et al. RTMPose： real-time multi-person pose estimation based on MMPose［EB/OL］. ［2025-06-23］..
[39]	ZHANG L， HUANG W， ZHENG J， et al. EdgePose： real-time human pose estimation scheme for industrial scenes［J］. IEEE Access， 2024， 12： 156702-156716.