基于解耦注意力与幻影卷积的轻量级人体姿态估计

doi:10.11772/j.issn.1001-9081.2024010099

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (1): 223-233.DOI: 10.11772/j.issn.1001-9081.2024010099

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于解耦注意力与幻影卷积的轻量级人体姿态估计

陈俊颖¹, 郭士杰¹^,²^,³(), 陈玲玲⁴

^1.复旦大学工程与应用技术研究院，上海 200433
^2.河北工业大学机械工程学院，天津 300130
^3.智能康复装置与检测技术教育部工程研究中心（河北工业大学），天津 300401
^4.河北工业大学人工智能与数据科学学院，天津 300130

收稿日期:2024-01-26 修回日期:2024-03-25 接受日期:2024-03-25 发布日期:2024-05-09 出版日期:2025-01-10
通讯作者: 郭士杰
作者简介:陈俊颖（2000—），男，湖南常德人，硕士研究生，主要研究方向：计算机视觉、人体姿态估计；
陈玲玲（1981—），女，河北张家口人，教授，博士，主要研究方向：计算机视觉、助老助残机器人。
基金资助:
河北省省级科技计划项目(22372001D);河北省自然科学基金资助项目(F2021202021)

Lightweight human pose estimation based on decoupled attention and ghost convolution

Junying CHEN¹, Shijie GUO¹^,²^,³(), Lingling CHEN⁴

^1.Academy for Engineering and Technology，Fudan University，Shanghai 200433，China
^2.School of Mechanical Engineering，Hebei University of Technology，Tianjin 300130，China
^3.Intelligent Rehabilitation Devices and Detection Technology Engineering Research Center of the Ministry of Education （Hebei University of Technology），Tianjin 300401，China
^4.School of Artificial Intelligence，Hebei University of Technology，Tianjin 300130，China

Received:2024-01-26 Revised:2024-03-25 Accepted:2024-03-25 Online:2024-05-09 Published:2025-01-10
Contact: Shijie GUO
About author:CHEN Junying， born in 2000， M. S. candidate. His research interests include computer vision， human pose estimation.
CHEN Lingling， born in 1981， Ph. D.， professor. Her research interests include computer vision， robots for the elderly and disabled.
Supported by:
Science and Technology Program of Hebei Province(22372001D);Natural Science Foundation of Hebei Province(F2021202021)

摘要/Abstract

摘要：

随着轻量级网络的发展，人体姿态估计任务得以在计算资源有限的设备上执行，然而，提升精度变得更具有挑战性。这些挑战主要源于网络复杂度与计算资源的矛盾，导致模型在简化时牺牲了表示能力。针对上述问题，提出一种基于解耦注意力和幻影卷积的轻量级人体姿态估计网络（DGLNet）。具体来说，DGLNet以小型高分辨率网络（Small HRNet）模型为基础架构，通过引入解耦注意力机制构建DFDbottleneck模块；采用shuffleblock的结构对基础模块进行重新设计，即用轻量级幻影卷积替代计算量大的点卷积，并利用解耦注意力机制增强模块性能，从而构建DGBblock模块；此外，用幻影卷积和解耦注意力重新构建的深度可分离卷积模块来替代原过渡层模块，从而构建GSCtransition模块，进一步减少计算量并增强特征交互性和提高性能。在COCO验证集上的实验结果显示，DGLNet优于轻量级高分辨率网络（Lite-HRNet），在计算量和参数量不增加的情况下，最高精度达到了71.9%；与常见的轻量级姿态估计网络MobileNetV2和ShuffleNetV2相比，DGLNet在仅使用21.2%和25.0%的计算量情况下分别实现了4.6和8.3个百分点的精度提升；在AP⁵⁰的评价标准上，DGLNet超过了大型高分辨率网络（HRNet）的同时计算量和参数量远小于HRNet。

关键词: 人体姿态估计, 轻量级网络, 注意力机制, 幻影卷积, 深度可分离卷积模块

Abstract:

With the development of lightweight networks， human pose estimation tasks can be performed on devices with limited computational resources. However， improving accuracy has become more challenging. These challenges mainly led by the contradiction between network complexity and computational resources， resulting in the sacrifice of representation capabilities when simplifying the model. To address these issues， a Decoupled attention and Ghost convolution based Lightweight human pose estimation Network （DGLNet） was proposed. Specifically， in DGLNet， with Small High-Resolution Network （Small HRNet） model as basic architecture， by introducing a decoupled attention mechanism， DFDbottleneck module was constructed. The basic modules were redesigned with shuffleblock structure， in which computationally-intensive point convolutions were replaced with lightweight ghost convolutions， and the decoupled attention mechanism was utilized to enhance module performance， leading to the creation of DGBblock module. Additionally， the original transition layer modules were replaced with redesigned depthwise separable convolution modules that incorporated ghost convolution and decoupled attention， resulting in the construction of GSCtransition module. This modification further reduced computational complexity while enhancing feature interaction and performance. Experimental results on COCO validation set show that DGLNet outperforms the state-of-the-art Lite-High-Resolution Network （Lite-HRNet） model， achieving the maximum accuracy of 71.9% without increasing computational complexity or the number of parameters. Compared to common lightweight pose estimation networks such as MobileNetV2 and ShuffleNetV2， DGLNet achieves the precision improvement of 4.6 and 8.3 percentage points respectively， while only utilizing 21.2% and 25.0% of their computational resources. Furthermore， under the AP⁵⁰ evaluation criterion， DGLNet surpasses the large High-Resolution Network （HRNet） while having significantly less computational and parameters.

Key words: human pose estimation, lightweight network, attention mechanism, ghost convolution, depthwise separable convolution module

中图分类号:

TP181

陈俊颖, 郭士杰, 陈玲玲. 基于解耦注意力与幻影卷积的轻量级人体姿态估计[J]. 计算机应用, 2025, 45(1): 223-233.

Junying CHEN, Shijie GUO, Lingling CHEN. Lightweight human pose estimation based on decoupled attention and ghost convolution[J]. Journal of Computer Applications, 2025, 45(1): 223-233.

图/表 13

参考文献 41

1	ZHENG C， WU W， CHEN C， et al. Deep learning-based human pose estimation： a survey ［J］. ACM Computing Surveys， 2023， 56（1）： No.11.
2	XIAO B， WU H， WEI Y. Simple baselines for human pose estimation and tracking ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11210. Cham： Springer， 2018： 472-487.
3	WEI S E， RAMAKRISHNA V， KANADE T， et al. Convolutional pose machines ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4724-4732.
4	NEWELL A， YANG K， DENG J. Stacked hourglass networks for human pose estimation ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9912. Cham： Springer， 2016： 483-499.
5	CHU X， YANG W， OUYANG W， et al. Multi-context attention for human pose estimation ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5669-5678.
6	YANG W， LI S， OUYANG W， et al. Learning feature pyramids for human pose estimation ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1290-1299.
7	SUN K， XIAO B， LIU D， et al. Deep high-resolution representation learning for human pose estimation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5686-5696.
8	HOWARD A G， ZHU M， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications ［EB/OL］. ［2024-02-08］. .
9	SANDLER M， HOWARD A， ZHU M， et al. MobileNetV2： inverted residuals and linear bottlenecks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520.
10	HOWARD A， SANDLER M， CHEN B， et al. Searching for MobileNetV3 ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1314-1324.
11	ZHANG X， ZHOU X， LIN M， et al. ShuffleNet： an extremely efficient convolutional neural network for mobile devices ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6848-6856.
12	MA N， ZHANG X， ZHENG H T， et al. ShuffleNet V2： practical guidelines for efficient CNN architecture design ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11218. Cham： Springer， 2018： 122-138.
13	TAN M， LE Q V. EfficientNet： rethinking model scaling for convolutional neural networks ［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 6105-6114.
14	TAN M， LE Q V. EfficientNetV2： smaller models and faster training ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 10096-10106.
15	CUI C， GAO T， WEI S， et al. PP-LCNet： a lightweight CPU convolutional neural network ［EB/OL］. ［2023-10-08］. .
16	TANG Y， HAN K， GUO J， et al. GhostNetV2： enhance cheap operation with long-range attention ［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 9969-9982.
17	HAN K， WANG Y， TIAN Q， et al. GhostNet： more features from cheap operations ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1577-1586.
18	WANG J， SUN K， CHENG T， et al. Deep high-resolution representation learning for visual recognition ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（10）： 3349-3364.
19	CHEN Y， WANG Z， PENG Y， et al. Cascaded pyramid network for multi-person pose estimation ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7103-7112.
20	FANG H S， XIE S， TAI Y W， et al. RMPE： regional multi-person pose estimation ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2353-2362.
21	CAI Y， WANG Z， LUO Z， et al. Learning delicate local representations for multi-person pose estimation ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12348. Cham： Springer， 2020： 455-472.
22	WANG J， LONG X， GAO Y， et al. Graph-PCNN： two stage human pose estimation with graph pose refinement ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12356. Cham： Springer， 2020： 492-508.
23	PAPANDREOU G， ZHU T， KANAZAWA N， et al. Towards accurate multi-person pose estimation in the wild ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3711-3719.
24	YU C， XIAO B， GAO C， et al. Lite-HRNet： a lightweight high-resolution network ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10435-10445.
25	PISHCHULIN L， INSAFUTDINOV E， TANG S， et al. DeepCut： joint subset partition and labeling for multi person pose estimation ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4929-4937.
26	INSAFUTDINOV E， PISHCHULIN L， ANDRES B， et al. DeeperCut： a deeper， stronger， and faster multi-person pose estimation model ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9910. Cham： Springer， 2016： 34-50.
27	CAO Z， SIMON T， WEI S E， et al. Realtime multi-person 2D pose estimation using part affinity fields ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 1302-1310.
28	KREISS S， BERTONI L， ALAHI A. PifPaf： composite fields for human pose estimation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 11969-11978.
29	NEWELL A， HUANG Z， DENG J. Associative embedding： end-to-end learning for joint detection and grouping ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 2274-2284.
30	CHENG B， XIAO B， WANG J， et al. HigherHRNet： scale-aware representation learning for bottom-up human pose estimation ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 5385-5394.
31	IANDOLA F N， HAN S， MOSKEWICZ M W， et al. SqueezeNet： AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size ［EB/OL］. ［2023-10-08］. .
32	ZHOU D， HOU Q， CHEN Y， et al. Rethinking bottleneck structure for efficient mobile network design ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12348. Cham： Springer， 2020： 680-697.
33	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale ［EB/OL］. ［2023-11-10］. .
34	WANG X， GIRSHICK R， GUPTA A， et al. Non-local neural networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7794-7803.
35	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision transformer using shifted windows ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9992-10002.
36	MEHTA S， RASTEGARI M. MobileViT： light-weight， general-purpose， and mobile-friendly vision transformer ［EB/OL］. ［2023-09-08］. .
37	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
38	ZHANG F， ZHU X， DAI H， et al. Distribution-aware coordinate representation for human pose estimation ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 7091-7100.
39	CHEN Y， DAI X， LIU M， et al. Dynamic convolution： attention over convolution kernels ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11027-11036.
40	CHEN Y， DAI X， LIU M， et al. Dynamic ReLU ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12364. Cham： Springer， 2020： 351-367.
41	ANDRILUKA M， PISHCHULIN L， GEHLER P， et al. 2D human pose estimation： new benchmark and state of the art analysis ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 3686-3693.

层	输出尺寸	操作算子	分辨率分支	输出通道数	重复次数	模块数
层	输出尺寸	操作算子	分辨率分支	输出通道数	重复次数	DGLNet-18	DGLNet-30
Image	256×256		1×	3
stem	64×64	conv2d	2×	32	1	1	1
stem	64×64	DFDbottleneck	4×	32	1	1	1
stage2	64×64	DGBblock	4×8×	40，80	2	2	3
stage2	64×64	GSCtransition	4×8×	40，80	1	2	3
stage3	64×64	DGBblock	4×8×16×	40，80，160	2	4	8
stage3	64×64	GSCtransition	4×8×16×	40，80，160	1	4	8
stage4	64×64	DGBblock	4×8×16×32×	40，80，160，320	2	2	3
stage4	64×64	GSCtransition	4×8×16×32×	40，80，160，320	1	2	3

层	输出尺寸	操作算子	分辨率分支	输出通道数	重复次数	模块数
层	输出尺寸	操作算子	分辨率分支	输出通道数	重复次数	DGLNet-18	DGLNet-30
Image	256×256		1×	3
stem	64×64	conv2d	2×	32	1	1	1
stem	64×64	DFDbottleneck	4×	32	1	1	1
stage2	64×64	DGBblock	4×8×	40，80	2	2	3
stage2	64×64	GSCtransition	4×8×	40，80	1	2	3
stage3	64×64	DGBblock	4×8×16×	40，80，160	2	4	8
stage3	64×64	GSCtransition	4×8×16×	40，80，160	1	4	8
stage4	64×64	DGBblock	4×8×16×32×	40，80，160，320	2	2	3
stage4	64×64	GSCtransition	4×8×16×32×	40，80，160，320	1	2	3

模型		骨干网络	输入尺寸	参数量/10⁶	计算量/GFLOPs	评估指标值/%
模型		AP	AP⁵⁰	AP⁷⁵	AP^M	AP^L	AR
大型网络	Hourglass^［4］	Hourglass	256×192	25.1	14.3	66.9	—	—	—	—	—
	CPN^［19］	ResNet-50	256×192	27.0	6.2	68.6	—	—	—	—	—
	SimpleBaseline^［2］	ResNet-50	256×192	34.0	8.9	70.4	88.6	78.3	67.1	77.2	76.3
	HRNetV1^［7］	HRNetV1-W32	256×192	28.5	7.1	73.4	89.5	80.7	70.2	80.1	78.9
	DARK^［38］	HRNetV1-W48	128×96	63.6	3.6	71.9	89.1	79.6	69.2	78.0	77.9
小型网络	MobileNetV2^［9］	MobileNetV2	256×192	9.6	1.4	64.6	87.4	72.3	61.1	71.2	70.7
	MobileNetV2	MobileNetV2	384×288	9.6	3.3	67.3	87.9	74.3	62.8	74.7	72.9
	ShuffleNetV2^［12］	ShuffleNetV2	256×192	7.6	1.2	59.9	85.4	66.3	56.6	66.2	66.4
	ShuffleNetV2	ShuffleNetV2	384×288	7.6	2.8	63.6	86.5	70.5	59.5	70.7	69.7
	Small HRNet^［18］	HRNet-W16	256×192	1.3	0.5	55.2	83.7	62.4	52.3	61.0	62.1
	Small HRNet	HRNet-W16	384×288	1.3	1.2	56.0	83.8	63.0	52.4	62.6	62.6
	DY-MobileNetV2^［39］	DY-MobileNetV2	256×192	16.1	1.0	68.2	88.4	76.0	65.0	74.7	74.2
	DY-RELU^［40］	MobileNetV2	256×192	9.0	1.0	68.1	88.5	76.2	64.8	74.3	—
	Lite-HRNet^［24］	Lite-HRNet-18	256×192	1.1	0.2	64.8	86.7	73.0	62.1	70.5	71.2
	Lite-HRNet	Lite-HRNet-18	384×288	1.1	0.4	67.6	87.8	75.0	64.5	73.7	73.7
	Lite-HRNet	Lite-HRNet-30	256×192	1.8	0.3	67.2	88.0	75.0	64.3	73.1	73.3
	Lite-HRNet	Lite-HRNet-30	384×288	1.8	0.7	70.4	88.7	77.7	67.5	76.3	76.2
	DGLNet	DGLNet-18	256×192	1.1	0.2	66.1	89.4	73.2	64.0	71.9	71.8
		DGLNet-18	384×288	1.1	0.4	68.5	89.5	76.0	65.9	73.9	74.1
		DGLNet-30	256×192	1.8	0.3	68.4	89.7	76.1	65.9	74.2	73.8
		DGLNet-30	384×288	1.8	0.7	71.9	89.9	78.2	68.8	77.3	76.9

模型		骨干网络	输入尺寸	参数量/10⁶	计算量/GFLOPs	评估指标值/%
模型		AP	AP⁵⁰	AP⁷⁵	AP^M	AP^L	AR
大型网络	Hourglass^［4］	Hourglass	256×192	25.1	14.3	66.9	—	—	—	—	—
	CPN^［19］	ResNet-50	256×192	27.0	6.2	68.6	—	—	—	—	—
	SimpleBaseline^［2］	ResNet-50	256×192	34.0	8.9	70.4	88.6	78.3	67.1	77.2	76.3
	HRNetV1^［7］	HRNetV1-W32	256×192	28.5	7.1	73.4	89.5	80.7	70.2	80.1	78.9
	DARK^［38］	HRNetV1-W48	128×96	63.6	3.6	71.9	89.1	79.6	69.2	78.0	77.9
小型网络	MobileNetV2^［9］	MobileNetV2	256×192	9.6	1.4	64.6	87.4	72.3	61.1	71.2	70.7
	MobileNetV2	MobileNetV2	384×288	9.6	3.3	67.3	87.9	74.3	62.8	74.7	72.9
	ShuffleNetV2^［12］	ShuffleNetV2	256×192	7.6	1.2	59.9	85.4	66.3	56.6	66.2	66.4
	ShuffleNetV2	ShuffleNetV2	384×288	7.6	2.8	63.6	86.5	70.5	59.5	70.7	69.7
	Small HRNet^［18］	HRNet-W16	256×192	1.3	0.5	55.2	83.7	62.4	52.3	61.0	62.1
	Small HRNet	HRNet-W16	384×288	1.3	1.2	56.0	83.8	63.0	52.4	62.6	62.6
	DY-MobileNetV2^［39］	DY-MobileNetV2	256×192	16.1	1.0	68.2	88.4	76.0	65.0	74.7	74.2
	DY-RELU^［40］	MobileNetV2	256×192	9.0	1.0	68.1	88.5	76.2	64.8	74.3	—
	Lite-HRNet^［24］	Lite-HRNet-18	256×192	1.1	0.2	64.8	86.7	73.0	62.1	70.5	71.2
	Lite-HRNet	Lite-HRNet-18	384×288	1.1	0.4	67.6	87.8	75.0	64.5	73.7	73.7
	Lite-HRNet	Lite-HRNet-30	256×192	1.8	0.3	67.2	88.0	75.0	64.3	73.1	73.3
	Lite-HRNet	Lite-HRNet-30	384×288	1.8	0.7	70.4	88.7	77.7	67.5	76.3	76.2
	DGLNet	DGLNet-18	256×192	1.1	0.2	66.1	89.4	73.2	64.0	71.9	71.8
		DGLNet-18	384×288	1.1	0.4	68.5	89.5	76.0	65.9	73.9	74.1
		DGLNet-30	256×192	1.8	0.3	68.4	89.7	76.1	65.9	74.2	73.8
		DGLNet-30	384×288	1.8	0.7	71.9	89.9	78.2	68.8	77.3	76.9

模型		骨干网络	输入尺寸	参数量/10⁶	计算量/GFLOPs	评估指标值/%
模型		AP	AP⁵⁰	AP⁷⁵	AP^M	AP^L	AR
大型网络	SimpleBaseline^［2］	ResNet-50	256×192	34.0	8.9	70.0	90.9	77.9	66.8	75.8	75.6
	CPN^［19］	ResNet-Inception	384×288	—	—	72.1	91.4	80.0	68.7	77.2	78.5
	HRNetV1^［7］	HRNetV1-W32	384×288	28.5	16.0	74.9	92.5	82.8	71.3	80.9	80.1
	DARK^［38］	HRNetV1-W48	384×288	63.6	32.9	76.2	92.5	83.6	72.5	82.4	81.1
小型网络	MobileNetV2^［9］	MobileNetV2	384×288	9.8	3.3	66.8	90.0	74.0	62.6	73.3	72.3
	ShuffleNetV2^［12］	ShuffleNetV2	384×288	7.6	2.8	62.9	88.5	69.4	58.9	69.3	68.9
	Small HRNet^［18］	HRNet-W16	384×288	1.3	1.2	55.2	85.8	61.4	51.7	61.2	61.5
	Lite-HRNet^［24］	Lite-HRNet-18	384×288	1.1	0.4	66.9	89.4	74.4	64.0	72.2	72.6
	Lite-HRNet	Lite-HRNet-30	384×288	1.8	0.7	69.7	90.7	77.5	66.9	75.0	75.4
	DGLNet	DGLNet-18	384×288	1.1	0.4	68.6	90.1	75.7	65.3	74.0	74.4
	DGLNet	DGLNet-30	384×288	1.8	0.7	71.0	90.9	77.9	67.3	76.5	76.7

基于解耦注意力与幻影卷积的轻量级人体姿态估计

Lightweight human pose estimation based on decoupled attention and ghost convolution

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 41

相关文章 15

编辑推荐

Metrics

模型	参数量/10⁶	计算量/GFLOPs	PCKh@0.5/%
MobileNetv2^［9］	9.6	1.9	85.4
MobileNetv3^［10］	8.7	1.8	84.3
ShuffleNetv2^［12］	7.6	1.7	82.8
Small HRNet-W16	1.3	0.7	80.2
Lite-HRNet-18^［24］	1.1	0.2	86.1
Lite-HRNet-30	1.8	0.4	87.0
DGLNet-18（本文模型）	1.1	0.2	86.8
DGLNet-30（本文模型）	1.8	0.4	87.7

模型	参数量/10⁶	计算量/GFLOPs	AP
Small HRNet	1.30	0.50	55.2
Small HRNet+DFDbottleneck	1.34	0.51	59.8
Small HRNet+DGBblock	1.13	0.34	61.7
Small HRNet+GSCtransition	1.21	0.39	60.2
DGLNet-18	1.10	0.21	66.3

[1]	徐杰, 钟勇, 王阳, 张昌福, 杨观赐. 基于上下文通道注意力机制的人脸属性估计与表情识别[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 253-260.
[2]	王丽芳, 吴荆双, 尹鹏亮, 胡立华. 基于注意力机制和能量函数的动作识别算法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 234-239.
[3]	宋鹏程, 郭立君, 张荣. 利用局部-全局时间依赖的弱监督视频异常检测[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 240-246.
[4]	黄颖, 李昌盛, 彭慧, 刘苏. 用于动态场景高动态范围成像的局部熵引导的双分支网络[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 204-213.
[5]	张嘉琳, 任庆桦, 毛启容. 利用全局-局部特征依赖的反欺骗说话人验证系统[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 308-317.
[6]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[7]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[8]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[9]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[10]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[11]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[12]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[13]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[14]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[15]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.