Lightweight human pose estimation based on decoupled attention and ghost convolution

doi:10.11772/j.issn.1001-9081.2024010099

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (1): 223-233.DOI: 10.11772/j.issn.1001-9081.2024010099

• Multimedia computing and computer simulation • Previous Articles Next Articles

Lightweight human pose estimation based on decoupled attention and ghost convolution

Junying CHEN¹, Shijie GUO¹^,²^,³(), Lingling CHEN⁴

^1.Academy for Engineering and Technology，Fudan University，Shanghai 200433，China
^2.School of Mechanical Engineering，Hebei University of Technology，Tianjin 300130，China
^3.Intelligent Rehabilitation Devices and Detection Technology Engineering Research Center of the Ministry of Education （Hebei University of Technology），Tianjin 300401，China
^4.School of Artificial Intelligence，Hebei University of Technology，Tianjin 300130，China

Received:2024-01-26 Revised:2024-03-25 Accepted:2024-03-25 Online:2024-05-09 Published:2025-01-10
Contact: Shijie GUO
About author:CHEN Junying， born in 2000， M. S. candidate. His research interests include computer vision， human pose estimation.
CHEN Lingling， born in 1981， Ph. D.， professor. Her research interests include computer vision， robots for the elderly and disabled.
Supported by:
Science and Technology Program of Hebei Province(22372001D);Natural Science Foundation of Hebei Province(F2021202021)

基于解耦注意力与幻影卷积的轻量级人体姿态估计

陈俊颖¹, 郭士杰¹^,²^,³(), 陈玲玲⁴

^1.复旦大学工程与应用技术研究院，上海 200433
^2.河北工业大学机械工程学院，天津 300130
^3.智能康复装置与检测技术教育部工程研究中心（河北工业大学），天津 300401
^4.河北工业大学人工智能与数据科学学院，天津 300130

通讯作者: 郭士杰
作者简介:陈俊颖（2000—），男，湖南常德人，硕士研究生，主要研究方向：计算机视觉、人体姿态估计；
陈玲玲（1981—），女，河北张家口人，教授，博士，主要研究方向：计算机视觉、助老助残机器人。
基金资助:
河北省省级科技计划项目(22372001D);河北省自然科学基金资助项目(F2021202021)

Abstract

Abstract:

With the development of lightweight networks， human pose estimation tasks can be performed on devices with limited computational resources. However， improving accuracy has become more challenging. These challenges mainly led by the contradiction between network complexity and computational resources， resulting in the sacrifice of representation capabilities when simplifying the model. To address these issues， a Decoupled attention and Ghost convolution based Lightweight human pose estimation Network （DGLNet） was proposed. Specifically， in DGLNet， with Small High-Resolution Network （Small HRNet） model as basic architecture， by introducing a decoupled attention mechanism， DFDbottleneck module was constructed. The basic modules were redesigned with shuffleblock structure， in which computationally-intensive point convolutions were replaced with lightweight ghost convolutions， and the decoupled attention mechanism was utilized to enhance module performance， leading to the creation of DGBblock module. Additionally， the original transition layer modules were replaced with redesigned depthwise separable convolution modules that incorporated ghost convolution and decoupled attention， resulting in the construction of GSCtransition module. This modification further reduced computational complexity while enhancing feature interaction and performance. Experimental results on COCO validation set show that DGLNet outperforms the state-of-the-art Lite-High-Resolution Network （Lite-HRNet） model， achieving the maximum accuracy of 71.9% without increasing computational complexity or the number of parameters. Compared to common lightweight pose estimation networks such as MobileNetV2 and ShuffleNetV2， DGLNet achieves the precision improvement of 4.6 and 8.3 percentage points respectively， while only utilizing 21.2% and 25.0% of their computational resources. Furthermore， under the AP⁵⁰ evaluation criterion， DGLNet surpasses the large High-Resolution Network （HRNet） while having significantly less computational and parameters.

Key words: human pose estimation, lightweight network, attention mechanism, ghost convolution, depthwise separable convolution module

摘要：

随着轻量级网络的发展，人体姿态估计任务得以在计算资源有限的设备上执行，然而，提升精度变得更具有挑战性。这些挑战主要源于网络复杂度与计算资源的矛盾，导致模型在简化时牺牲了表示能力。针对上述问题，提出一种基于解耦注意力和幻影卷积的轻量级人体姿态估计网络（DGLNet）。具体来说，DGLNet以小型高分辨率网络（Small HRNet）模型为基础架构，通过引入解耦注意力机制构建DFDbottleneck模块；采用shuffleblock的结构对基础模块进行重新设计，即用轻量级幻影卷积替代计算量大的点卷积，并利用解耦注意力机制增强模块性能，从而构建DGBblock模块；此外，用幻影卷积和解耦注意力重新构建的深度可分离卷积模块来替代原过渡层模块，从而构建GSCtransition模块，进一步减少计算量并增强特征交互性和提高性能。在COCO验证集上的实验结果显示，DGLNet优于轻量级高分辨率网络（Lite-HRNet），在计算量和参数量不增加的情况下，最高精度达到了71.9%；与常见的轻量级姿态估计网络MobileNetV2和ShuffleNetV2相比，DGLNet在仅使用21.2%和25.0%的计算量情况下分别实现了4.6和8.3个百分点的精度提升；在AP⁵⁰的评价标准上，DGLNet超过了大型高分辨率网络（HRNet）的同时计算量和参数量远小于HRNet。

关键词: 人体姿态估计, 轻量级网络, 注意力机制, 幻影卷积, 深度可分离卷积模块

CLC Number:

TP181

Junying CHEN, Shijie GUO, Lingling CHEN. Lightweight human pose estimation based on decoupled attention and ghost convolution[J]. Journal of Computer Applications, 2025, 45(1): 223-233.

陈俊颖, 郭士杰, 陈玲玲. 基于解耦注意力与幻影卷积的轻量级人体姿态估计[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 223-233.

Figures/Tables 13

References 41

1	ZHENG C， WU W， CHEN C， et al. Deep learning-based human pose estimation： a survey ［J］. ACM Computing Surveys， 2023， 56（1）： No.11.
2	XIAO B， WU H， WEI Y. Simple baselines for human pose estimation and tracking ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11210. Cham： Springer， 2018： 472-487.
3	WEI S E， RAMAKRISHNA V， KANADE T， et al. Convolutional pose machines ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4724-4732.
4	NEWELL A， YANG K， DENG J. Stacked hourglass networks for human pose estimation ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9912. Cham： Springer， 2016： 483-499.
5	CHU X， YANG W， OUYANG W， et al. Multi-context attention for human pose estimation ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5669-5678.
6	YANG W， LI S， OUYANG W， et al. Learning feature pyramids for human pose estimation ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1290-1299.
7	SUN K， XIAO B， LIU D， et al. Deep high-resolution representation learning for human pose estimation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5686-5696.
8	HOWARD A G， ZHU M， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications ［EB/OL］. ［2024-02-08］. .
9	SANDLER M， HOWARD A， ZHU M， et al. MobileNetV2： inverted residuals and linear bottlenecks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520.
10	HOWARD A， SANDLER M， CHEN B， et al. Searching for MobileNetV3 ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1314-1324.
11	ZHANG X， ZHOU X， LIN M， et al. ShuffleNet： an extremely efficient convolutional neural network for mobile devices ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6848-6856.
12	MA N， ZHANG X， ZHENG H T， et al. ShuffleNet V2： practical guidelines for efficient CNN architecture design ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11218. Cham： Springer， 2018： 122-138.
13	TAN M， LE Q V. EfficientNet： rethinking model scaling for convolutional neural networks ［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 6105-6114.
14	TAN M， LE Q V. EfficientNetV2： smaller models and faster training ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 10096-10106.
15	CUI C， GAO T， WEI S， et al. PP-LCNet： a lightweight CPU convolutional neural network ［EB/OL］. ［2023-10-08］. .
16	TANG Y， HAN K， GUO J， et al. GhostNetV2： enhance cheap operation with long-range attention ［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 9969-9982.
17	HAN K， WANG Y， TIAN Q， et al. GhostNet： more features from cheap operations ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1577-1586.
18	WANG J， SUN K， CHENG T， et al. Deep high-resolution representation learning for visual recognition ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（10）： 3349-3364.
19	CHEN Y， WANG Z， PENG Y， et al. Cascaded pyramid network for multi-person pose estimation ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7103-7112.
20	FANG H S， XIE S， TAI Y W， et al. RMPE： regional multi-person pose estimation ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2353-2362.
21	CAI Y， WANG Z， LUO Z， et al. Learning delicate local representations for multi-person pose estimation ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12348. Cham： Springer， 2020： 455-472.
22	WANG J， LONG X， GAO Y， et al. Graph-PCNN： two stage human pose estimation with graph pose refinement ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12356. Cham： Springer， 2020： 492-508.
23	PAPANDREOU G， ZHU T， KANAZAWA N， et al. Towards accurate multi-person pose estimation in the wild ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3711-3719.
24	YU C， XIAO B， GAO C， et al. Lite-HRNet： a lightweight high-resolution network ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10435-10445.
25	PISHCHULIN L， INSAFUTDINOV E， TANG S， et al. DeepCut： joint subset partition and labeling for multi person pose estimation ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4929-4937.
26	INSAFUTDINOV E， PISHCHULIN L， ANDRES B， et al. DeeperCut： a deeper， stronger， and faster multi-person pose estimation model ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9910. Cham： Springer， 2016： 34-50.
27	CAO Z， SIMON T， WEI S E， et al. Realtime multi-person 2D pose estimation using part affinity fields ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 1302-1310.
28	KREISS S， BERTONI L， ALAHI A. PifPaf： composite fields for human pose estimation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 11969-11978.
29	NEWELL A， HUANG Z， DENG J. Associative embedding： end-to-end learning for joint detection and grouping ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 2274-2284.
30	CHENG B， XIAO B， WANG J， et al. HigherHRNet： scale-aware representation learning for bottom-up human pose estimation ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 5385-5394.
31	IANDOLA F N， HAN S， MOSKEWICZ M W， et al. SqueezeNet： AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size ［EB/OL］. ［2023-10-08］. .
32	ZHOU D， HOU Q， CHEN Y， et al. Rethinking bottleneck structure for efficient mobile network design ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12348. Cham： Springer， 2020： 680-697.
33	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale ［EB/OL］. ［2023-11-10］. .
34	WANG X， GIRSHICK R， GUPTA A， et al. Non-local neural networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7794-7803.
35	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision transformer using shifted windows ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9992-10002.
36	MEHTA S， RASTEGARI M. MobileViT： light-weight， general-purpose， and mobile-friendly vision transformer ［EB/OL］. ［2023-09-08］. .
37	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
38	ZHANG F， ZHU X， DAI H， et al. Distribution-aware coordinate representation for human pose estimation ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 7091-7100.
39	CHEN Y， DAI X， LIU M， et al. Dynamic convolution： attention over convolution kernels ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11027-11036.
40	CHEN Y， DAI X， LIU M， et al. Dynamic ReLU ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12364. Cham： Springer， 2020： 351-367.
41	ANDRILUKA M， PISHCHULIN L， GEHLER P， et al. 2D human pose estimation： new benchmark and state of the art analysis ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 3686-3693.

层	输出尺寸	操作算子	分辨率分支	输出通道数	重复次数	模块数
层	输出尺寸	操作算子	分辨率分支	输出通道数	重复次数	DGLNet-18	DGLNet-30
Image	256×256		1×	3
stem	64×64	conv2d	2×	32	1	1	1
stem	64×64	DFDbottleneck	4×	32	1	1	1
stage2	64×64	DGBblock	4×8×	40，80	2	2	3
stage2	64×64	GSCtransition	4×8×	40，80	1	2	3
stage3	64×64	DGBblock	4×8×16×	40，80，160	2	4	8
stage3	64×64	GSCtransition	4×8×16×	40，80，160	1	4	8
stage4	64×64	DGBblock	4×8×16×32×	40，80，160，320	2	2	3
stage4	64×64	GSCtransition	4×8×16×32×	40，80，160，320	1	2	3

层	输出尺寸	操作算子	分辨率分支	输出通道数	重复次数	模块数
层	输出尺寸	操作算子	分辨率分支	输出通道数	重复次数	DGLNet-18	DGLNet-30
Image	256×256		1×	3
stem	64×64	conv2d	2×	32	1	1	1
stem	64×64	DFDbottleneck	4×	32	1	1	1
stage2	64×64	DGBblock	4×8×	40，80	2	2	3
stage2	64×64	GSCtransition	4×8×	40，80	1	2	3
stage3	64×64	DGBblock	4×8×16×	40，80，160	2	4	8
stage3	64×64	GSCtransition	4×8×16×	40，80，160	1	4	8
stage4	64×64	DGBblock	4×8×16×32×	40，80，160，320	2	2	3
stage4	64×64	GSCtransition	4×8×16×32×	40，80，160，320	1	2	3

模型		骨干网络	输入尺寸	参数量/10⁶	计算量/GFLOPs	评估指标值/%
模型		AP	AP⁵⁰	AP⁷⁵	AP^M	AP^L	AR
大型网络	Hourglass^［4］	Hourglass	256×192	25.1	14.3	66.9	—	—	—	—	—
	CPN^［19］	ResNet-50	256×192	27.0	6.2	68.6	—	—	—	—	—
	SimpleBaseline^［2］	ResNet-50	256×192	34.0	8.9	70.4	88.6	78.3	67.1	77.2	76.3
	HRNetV1^［7］	HRNetV1-W32	256×192	28.5	7.1	73.4	89.5	80.7	70.2	80.1	78.9
	DARK^［38］	HRNetV1-W48	128×96	63.6	3.6	71.9	89.1	79.6	69.2	78.0	77.9
小型网络	MobileNetV2^［9］	MobileNetV2	256×192	9.6	1.4	64.6	87.4	72.3	61.1	71.2	70.7
	MobileNetV2	MobileNetV2	384×288	9.6	3.3	67.3	87.9	74.3	62.8	74.7	72.9
	ShuffleNetV2^［12］	ShuffleNetV2	256×192	7.6	1.2	59.9	85.4	66.3	56.6	66.2	66.4
	ShuffleNetV2	ShuffleNetV2	384×288	7.6	2.8	63.6	86.5	70.5	59.5	70.7	69.7
	Small HRNet^［18］	HRNet-W16	256×192	1.3	0.5	55.2	83.7	62.4	52.3	61.0	62.1
	Small HRNet	HRNet-W16	384×288	1.3	1.2	56.0	83.8	63.0	52.4	62.6	62.6
	DY-MobileNetV2^［39］	DY-MobileNetV2	256×192	16.1	1.0	68.2	88.4	76.0	65.0	74.7	74.2
	DY-RELU^［40］	MobileNetV2	256×192	9.0	1.0	68.1	88.5	76.2	64.8	74.3	—
	Lite-HRNet^［24］	Lite-HRNet-18	256×192	1.1	0.2	64.8	86.7	73.0	62.1	70.5	71.2
	Lite-HRNet	Lite-HRNet-18	384×288	1.1	0.4	67.6	87.8	75.0	64.5	73.7	73.7
	Lite-HRNet	Lite-HRNet-30	256×192	1.8	0.3	67.2	88.0	75.0	64.3	73.1	73.3
	Lite-HRNet	Lite-HRNet-30	384×288	1.8	0.7	70.4	88.7	77.7	67.5	76.3	76.2
	DGLNet	DGLNet-18	256×192	1.1	0.2	66.1	89.4	73.2	64.0	71.9	71.8
		DGLNet-18	384×288	1.1	0.4	68.5	89.5	76.0	65.9	73.9	74.1
		DGLNet-30	256×192	1.8	0.3	68.4	89.7	76.1	65.9	74.2	73.8
		DGLNet-30	384×288	1.8	0.7	71.9	89.9	78.2	68.8	77.3	76.9

模型		骨干网络	输入尺寸	参数量/10⁶	计算量/GFLOPs	评估指标值/%
模型		AP	AP⁵⁰	AP⁷⁵	AP^M	AP^L	AR
大型网络	Hourglass^［4］	Hourglass	256×192	25.1	14.3	66.9	—	—	—	—	—
	CPN^［19］	ResNet-50	256×192	27.0	6.2	68.6	—	—	—	—	—
	SimpleBaseline^［2］	ResNet-50	256×192	34.0	8.9	70.4	88.6	78.3	67.1	77.2	76.3
	HRNetV1^［7］	HRNetV1-W32	256×192	28.5	7.1	73.4	89.5	80.7	70.2	80.1	78.9
	DARK^［38］	HRNetV1-W48	128×96	63.6	3.6	71.9	89.1	79.6	69.2	78.0	77.9
小型网络	MobileNetV2^［9］	MobileNetV2	256×192	9.6	1.4	64.6	87.4	72.3	61.1	71.2	70.7
	MobileNetV2	MobileNetV2	384×288	9.6	3.3	67.3	87.9	74.3	62.8	74.7	72.9
	ShuffleNetV2^［12］	ShuffleNetV2	256×192	7.6	1.2	59.9	85.4	66.3	56.6	66.2	66.4
	ShuffleNetV2	ShuffleNetV2	384×288	7.6	2.8	63.6	86.5	70.5	59.5	70.7	69.7
	Small HRNet^［18］	HRNet-W16	256×192	1.3	0.5	55.2	83.7	62.4	52.3	61.0	62.1
	Small HRNet	HRNet-W16	384×288	1.3	1.2	56.0	83.8	63.0	52.4	62.6	62.6
	DY-MobileNetV2^［39］	DY-MobileNetV2	256×192	16.1	1.0	68.2	88.4	76.0	65.0	74.7	74.2
	DY-RELU^［40］	MobileNetV2	256×192	9.0	1.0	68.1	88.5	76.2	64.8	74.3	—
	Lite-HRNet^［24］	Lite-HRNet-18	256×192	1.1	0.2	64.8	86.7	73.0	62.1	70.5	71.2
	Lite-HRNet	Lite-HRNet-18	384×288	1.1	0.4	67.6	87.8	75.0	64.5	73.7	73.7
	Lite-HRNet	Lite-HRNet-30	256×192	1.8	0.3	67.2	88.0	75.0	64.3	73.1	73.3
	Lite-HRNet	Lite-HRNet-30	384×288	1.8	0.7	70.4	88.7	77.7	67.5	76.3	76.2
	DGLNet	DGLNet-18	256×192	1.1	0.2	66.1	89.4	73.2	64.0	71.9	71.8
		DGLNet-18	384×288	1.1	0.4	68.5	89.5	76.0	65.9	73.9	74.1
		DGLNet-30	256×192	1.8	0.3	68.4	89.7	76.1	65.9	74.2	73.8
		DGLNet-30	384×288	1.8	0.7	71.9	89.9	78.2	68.8	77.3	76.9

模型		骨干网络	输入尺寸	参数量/10⁶	计算量/GFLOPs	评估指标值/%
模型		AP	AP⁵⁰	AP⁷⁵	AP^M	AP^L	AR
大型网络	SimpleBaseline^［2］	ResNet-50	256×192	34.0	8.9	70.0	90.9	77.9	66.8	75.8	75.6
	CPN^［19］	ResNet-Inception	384×288	—	—	72.1	91.4	80.0	68.7	77.2	78.5
	HRNetV1^［7］	HRNetV1-W32	384×288	28.5	16.0	74.9	92.5	82.8	71.3	80.9	80.1
	DARK^［38］	HRNetV1-W48	384×288	63.6	32.9	76.2	92.5	83.6	72.5	82.4	81.1
小型网络	MobileNetV2^［9］	MobileNetV2	384×288	9.8	3.3	66.8	90.0	74.0	62.6	73.3	72.3
	ShuffleNetV2^［12］	ShuffleNetV2	384×288	7.6	2.8	62.9	88.5	69.4	58.9	69.3	68.9
	Small HRNet^［18］	HRNet-W16	384×288	1.3	1.2	55.2	85.8	61.4	51.7	61.2	61.5
	Lite-HRNet^［24］	Lite-HRNet-18	384×288	1.1	0.4	66.9	89.4	74.4	64.0	72.2	72.6
	Lite-HRNet	Lite-HRNet-30	384×288	1.8	0.7	69.7	90.7	77.5	66.9	75.0	75.4
	DGLNet	DGLNet-18	384×288	1.1	0.4	68.6	90.1	75.7	65.3	74.0	74.4
	DGLNet	DGLNet-30	384×288	1.8	0.7	71.0	90.9	77.9	67.3	76.5	76.7

Lightweight human pose estimation based on decoupled attention and ghost convolution

基于解耦注意力与幻影卷积的轻量级人体姿态估计

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 41

Related Articles 15

Recommended Articles

Metrics

模型	参数量/10⁶	计算量/GFLOPs	PCKh@0.5/%
MobileNetv2^［9］	9.6	1.9	85.4
MobileNetv3^［10］	8.7	1.8	84.3
ShuffleNetv2^［12］	7.6	1.7	82.8
Small HRNet-W16	1.3	0.7	80.2
Lite-HRNet-18^［24］	1.1	0.2	86.1
Lite-HRNet-30	1.8	0.4	87.0
DGLNet-18（本文模型）	1.1	0.2	86.8
DGLNet-30（本文模型）	1.8	0.4	87.7

模型	参数量/10⁶	计算量/GFLOPs	AP
Small HRNet	1.30	0.50	55.2
Small HRNet+DFDbottleneck	1.34	0.51	59.8
Small HRNet+DGBblock	1.13	0.34	61.7
Small HRNet+GSCtransition	1.21	0.39	60.2
DGLNet-18	1.10	0.21	66.3

[1]	Jialin ZHANG, Qinghua REN, Qirong MAO. Speaker verification system utilizing global-local feature dependency for anti-spoofing [J]. Journal of Computer Applications, 2025, 45(1): 308-317.
[2]	Lifang WANG, Jingshuang WU, Pengliang YIN, Lihua HU. Action recognition algorithm based on attention mechanism and energy function [J]. Journal of Computer Applications, 2025, 45(1): 234-239.
[3]	Ying HUANG, Changsheng LI, Hui PENG, Su LIU. Dual-branch network guided by local entropy for dynamic scene high dynamic range imaging [J]. Journal of Computer Applications, 2025, 45(1): 204-213.
[4]	Jie XU, Yong ZHONG, Yang WANG, Changfu ZHANG, Guanci YANG. Facial attribute estimation and expression recognition based on contextual channel attention mechanism [J]. Journal of Computer Applications, 2025, 45(1): 253-260.
[5]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[6]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[7]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[8]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[9]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[10]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[11]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[12]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[13]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[14]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[15]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.