Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (10): 3179-3186.DOI: 10.11772/j.issn.1001-9081.2024091351
• Artificial intelligence • Previous Articles
Zhuoran LI1, Hua LI1(), Tong WANG2, Chaozhe JIANG2
Received:
2024-09-23
Revised:
2024-11-27
Accepted:
2024-12-02
Online:
2024-12-20
Published:
2025-10-10
Contact:
Hua LI
About author:
LI Zhuoran, born in 2000, M. S. candidate. His research interests include computer vision, human pose estimation.Supported by:
通讯作者:
李华
作者简介:
李卓然(2000—),男,四川达州人,硕士研究生,CCF会员,主要研究方向:计算机视觉、人体姿态估计基金资助:
CLC Number:
Zhuoran LI, Hua LI, Tong WANG, Chaozhe JIANG. Lightweight human pose estimation based on merge state space model[J]. Journal of Computer Applications, 2025, 45(10): 3179-3186.
李卓然, 李华, 王桐, 蒋朝哲. 基于融合特征状态空间模型的轻量化人体姿态估计[J]. 《计算机应用》唯一官方网站, 2025, 45(10): 3179-3186.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024091351
方法 | 参数量/ 106 | AP/% | AP50/% | AP75/% | APM/% | APL/% | AR/% |
---|---|---|---|---|---|---|---|
YOLO-Pose | 15.1 | 63.8 | 87.6 | 69.6 | — | 73.1 | 70.4 |
KAPAO | 12.6 | 64.4 | — | — | — | — | 71.5 |
Lite Pose | 2.7 | 56.8 | — | — | — | — | — |
LiteDEKR | 5.7 | 70.1 | 87.9 | 75.8 | 71.0 | — | — |
Lite‑HRNet | 1.8 | 67.2 | 88.0 | 75.0 | 64.3 | 73.1 | 73.3 |
HF‑HRNet | 7.4 | 70.8 | 88.9 | 78.0 | 67.6 | 77.3 | 76.5 |
EANet | 1.9 | 68.8 | 88.3 | 76.9 | 65.9 | 74.8 | 74.8 |
Light‑HRNet | 1.8 | 67.0 | 70.0 | 74.6 | — | 74.4 | 73.0 |
SimCC | 25.7 | 70.8 | 86.4 | 77.5 | 66.5 | 75.5 | 75.1 |
HigherHRNet | 63.8 | 70.5 | 89.3 | 77.2 | 66.6 | 75.8 | 74.9 |
DGLNet | 1.8 | 68.4 | 89.7 | 76.1 | 65.9 | 74.2 | 73.8 |
IDPNet | 4.2 | 72.6 | 91.6 | 80.4 | 69.8 | 76.9 | 75.4 |
Lite‑SimCC | 3.3 | 71.8 | 91.7 | 79.5 | 69.7 | 76.0 | 74.8 |
Tab. 1 Experimental results of different methods on COCO2017 dataset
方法 | 参数量/ 106 | AP/% | AP50/% | AP75/% | APM/% | APL/% | AR/% |
---|---|---|---|---|---|---|---|
YOLO-Pose | 15.1 | 63.8 | 87.6 | 69.6 | — | 73.1 | 70.4 |
KAPAO | 12.6 | 64.4 | — | — | — | — | 71.5 |
Lite Pose | 2.7 | 56.8 | — | — | — | — | — |
LiteDEKR | 5.7 | 70.1 | 87.9 | 75.8 | 71.0 | — | — |
Lite‑HRNet | 1.8 | 67.2 | 88.0 | 75.0 | 64.3 | 73.1 | 73.3 |
HF‑HRNet | 7.4 | 70.8 | 88.9 | 78.0 | 67.6 | 77.3 | 76.5 |
EANet | 1.9 | 68.8 | 88.3 | 76.9 | 65.9 | 74.8 | 74.8 |
Light‑HRNet | 1.8 | 67.0 | 70.0 | 74.6 | — | 74.4 | 73.0 |
SimCC | 25.7 | 70.8 | 86.4 | 77.5 | 66.5 | 75.5 | 75.1 |
HigherHRNet | 63.8 | 70.5 | 89.3 | 77.2 | 66.6 | 75.8 | 74.9 |
DGLNet | 1.8 | 68.4 | 89.7 | 76.1 | 65.9 | 74.2 | 73.8 |
IDPNet | 4.2 | 72.6 | 91.6 | 80.4 | 69.8 | 76.9 | 75.4 |
Lite‑SimCC | 3.3 | 71.8 | 91.7 | 79.5 | 69.7 | 76.0 | 74.8 |
方法 | 参数量/ 106 | 精确率/% | |||||||
---|---|---|---|---|---|---|---|---|---|
头部 | 肩部 | 肘部 | 手腕 | 臀部 | 膝盖 | 脚踝 | 平均 | ||
Lite-HRNet | 1.8 | 95.2 | 93.5 | 84.7 | 78.1 | 86.2 | 78.9 | 73.9 | 85.1 |
Dite-HRNet | 1.8 | — | — | — | — | — | — | — | 87.6 |
HF-HRNet | 7.4 | — | — | — | — | — | — | — | 88.5 |
Light-HRNet | 1.8 | 93.0 | — | — | 86.4 | 88.5 | 84.1 | 88.4 | 87.9 |
SimCC | 25.7 | 96.8 | 95.9 | 90.0 | 85.0 | 89.1 | 85.4 | 81.3 | 89.6 |
DGLNet | 1.8 | — | — | — | — | — | — | — | 87.7 |
Lite-NIRNet | 7.7 | 96.9 | 90.4 | 95.8 | 85.1 | 89.0 | 85.7 | 81.3 | 89.7 |
IDPNet | 4.2 | 96.8 | 95.2 | 88.7 | 84.0 | 88.1 | 84.1 | 79.1 | 88.6 |
TokenPose-L | 23.5 | 97.2 | 95.8 | 90.7 | 85.9 | 89.2 | 86.2 | 82.3 | 90.1 |
Lite-SimCC | 3.3 | 95.5 | 94.8 | 90.1 | 86.6 | 89.4 | 85.4 | 81.2 | 89.2 |
Tab. 2 Experimental results of different methods on MPII dataset
方法 | 参数量/ 106 | 精确率/% | |||||||
---|---|---|---|---|---|---|---|---|---|
头部 | 肩部 | 肘部 | 手腕 | 臀部 | 膝盖 | 脚踝 | 平均 | ||
Lite-HRNet | 1.8 | 95.2 | 93.5 | 84.7 | 78.1 | 86.2 | 78.9 | 73.9 | 85.1 |
Dite-HRNet | 1.8 | — | — | — | — | — | — | — | 87.6 |
HF-HRNet | 7.4 | — | — | — | — | — | — | — | 88.5 |
Light-HRNet | 1.8 | 93.0 | — | — | 86.4 | 88.5 | 84.1 | 88.4 | 87.9 |
SimCC | 25.7 | 96.8 | 95.9 | 90.0 | 85.0 | 89.1 | 85.4 | 81.3 | 89.6 |
DGLNet | 1.8 | — | — | — | — | — | — | — | 87.7 |
Lite-NIRNet | 7.7 | 96.9 | 90.4 | 95.8 | 85.1 | 89.0 | 85.7 | 81.3 | 89.7 |
IDPNet | 4.2 | 96.8 | 95.2 | 88.7 | 84.0 | 88.1 | 84.1 | 79.1 | 88.6 |
TokenPose-L | 23.5 | 97.2 | 95.8 | 90.7 | 85.9 | 89.2 | 86.2 | 82.3 | 90.1 |
Lite-SimCC | 3.3 | 95.5 | 94.8 | 90.1 | 86.6 | 89.4 | 85.4 | 81.2 | 89.2 |
长序列处理模块 | 参数量/106 | AP/% | AR/% | ||
---|---|---|---|---|---|
Transformer | Mamba | MergeMamba | |||
1.9 | 68.5 | 70.4 | |||
√ | 3.7 | 69.8 | 72.5 | ||
√ | 3.2 | 70.9 | 76.1 | ||
√ | 3.3 | 71.8 | 74.8 |
Tab. 3 Ablation experimental results of long sequence processing module
长序列处理模块 | 参数量/106 | AP/% | AR/% | ||
---|---|---|---|---|---|
Transformer | Mamba | MergeMamba | |||
1.9 | 68.5 | 70.4 | |||
√ | 3.7 | 69.8 | 72.5 | ||
√ | 3.2 | 70.9 | 76.1 | ||
√ | 3.3 | 71.8 | 74.8 |
损失函数 | AP/% | AR/% | |
---|---|---|---|
基于one-hot | — | 70.6 | 72.5 |
基于软标签 | 1 | 70.9 | 72.8 |
2 | 71.8 | 74.8 | |
3 | 71.6 | 74.5 | |
4 | 71.5 | 74.2 |
Tab. 4 Ablation experimental results of loss function
损失函数 | AP/% | AR/% | |
---|---|---|---|
基于one-hot | — | 70.6 | 72.5 |
基于软标签 | 1 | 70.9 | 72.8 |
2 | 71.8 | 74.8 | |
3 | 71.6 | 74.5 | |
4 | 71.5 | 74.2 |
[1] | DUAN H, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 2959-2968. |
[2] | LIU H, LIU T, ZHANG Z, et al. ARHPE: asymmetric relation-aware representation learning for head pose estimation in industrial human-computer interaction[J]. IEEE Transactions on Industrial Informatics, 2022, 18(10): 7107-7117. |
[3] | WEI W L, LIN J C, LIU T L, et al. Capturing humans in motion: temporal-attentive 3D human pose and shape estimation from monocular video[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 13201-13210. |
[4] | 陈俊颖,郭士杰,陈玲玲. 基于解耦注意力与幻影卷积的轻量级人体姿态估计[J]. 计算机应用, 2025, 45(1): 223-233. |
CHEN J Y, GUO S J, CHEN L L. Lightweight human pose estimation based on decoupled attention and ghost convolution[J]. Journal of Computer Applications, 2025, 45(1): 223-233. | |
[5] | WEI S E, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4724-4732. |
[6] | CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1302-1310. |
[7] | CHEN Y, WANG Z, PENG Y, et al. Cascaded pyramid network for multi-person pose estimation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7103-7112. |
[8] | SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5686-5696. |
[9] | CHENG B, XIAO B, WANG J, et al. HigherHRNet: scale-aware representation learning for bottom-up human pose estimation[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5385-5394. |
[10] | LI K, WANG S, ZHANG X, et al. Pose recognition with cascade Transformers[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 1944-1953. |
[11] | LI Y, ZHANG S, WANG Z, et al. TokenPose: learning keypoint tokens for human pose estimation[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 11293-11302. |
[12] | LUDWIG K, HARZIG P, LIENHART R. Detecting arbitrary intermediate keypoints for human pose estimation with vision Transformers[C]// Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2022: 663-671. |
[13] | HOULSBY N, WEISSENBORN D. Transformers for image recognition at scale[EB/OL]. [2024-03-13].. |
[14] | LI Y, YANG S, LIU P, et al. SimCC: a simple coordinate classification perspective for human pose estimation[C]// Proceedings of the 2022 European Conference on Computer Vision, LNCS 13666. Cham: Springer, 2022: 89-106. |
[15] | YU W, LUO M, ZHOU P, et al. MetaFormer is actually what you need for vision[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10809-10819. |
[16] | YU C, XIAO B, GAO C, et al. Lite-HRNet: a lightweight high-resolution network[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10440-10450. |
[17] | ZHANG X, ZHOU X, LIN M, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6848-6856. |
[18] | WANG Y, LI M, CAI H, et al. Lite Pose: efficient architecture design for 2D human pose estimation[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 13116-13126. |
[19] | SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4510-4520. |
[20] | CHEN B, WANG X, CHEN X, et al. EANet: towards lightweight human pose estimation with effective aggregation network[C]// Proceedings of the 2023 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2023: 2639-2644. |
[21] | LIU H, WU J, HE R. IDPNet: a light-weight network and its variants for human pose estimation[J]. The Journal of Supercomputing, 2024, 80(5): 6169-6191. |
[22] | 佘本杰,苏树智,朱彦敏,等. 基于非全局依赖积分回归的轻量姿态估计网络[J]. 计算机应用, 2025, 45(3): 972-977. |
SHE B J, SU S Z, ZHU Y M, et al. Lightweight pose estimation network based on non-globally dependent integral regression[J]. Journal of Computer Applications, 2025, 45(3): 972-977. | |
[23] | TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 1653-1660. |
[24] | XIAO B, WU H, WEI Y. Simple baselines for human pose estimation and tracking[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11210. Cham: Springer, 2018: 472-487. |
[25] | GU A, GOEL K, RÉ C. Efficiently modeling long sequences with structured state spaces[EB/OL]. [2024-05-12].. |
[26] | GU A, JOHNSON I, GOEL K, et al. Combining recurrent, convolutional, and continuous-time models with linear state-space layers[C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 572-585. |
[27] | SMITH J T H, WARRINGTON A, LINDERMAN S W. Simplified state space layers for sequence modeling[EB/OL]. [2024-08-13].. |
[28] | FU D Y, DAO T, SAAB K K, et al. Hungry hungry hippos: towards language modeling with state space models[EB/OL]. [2024-05-19].. |
[29] | HAN K, XIAO A, WU E, et al. Transformer in Transformer[C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 15908-15919. |
[30] | GU A, DAO T. Mamba: linear-time sequence modeling with selective state spaces[EB/OL]. [2024-04-11].. |
[31] | ZHU L, LIAO B, ZHANG Q, et al. Vision Mamba: efficient visual representation learning with bidirectional state space model[EB/OL]. [2025-01-14].. |
[32] | HE X, CAO K, ZHANG J, et al. Pan-Mamba: effective pan-sharpening with state space model[J]. Information Fusion, 2025, 115: No.102779. |
[33] | MA N, ZHANG X, ZHENG H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11218. Cham: Springer, 2018: 122-138. |
[34] | DÍAZ R, MARATHE A. Soft labels for ordinal regression[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4733-4742. |
[35] | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// Proceedings of the 2014 European Conference on Computer Vision, LNCS 8693. Cham: Springer, 2014: 740-755. |
[36] | ANDRILUKA M, PISHCHULIN L, GEHLER P, et al. 2D human pose estimation: new benchmark and state of the art analysis[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 3686-3693. |
[37] | MAJI D, NAGORI S, MATHEW M, et al. YOLO-Pose: enhancing YOLO for multi person pose estimation using object keypoint similarity loss[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 2636-2645. |
[38] | McNALLY W, VATS K, WONG A, et al. Rethinking keypoint representations: modeling keypoints and poses as objects for multi-person human pose estimation[C]// Proceedings of the 2022 European Conference on Computer Vision, LNCS 13666. Cham: Springer, 2022: 37-54. |
[39] | LV X, HAO W, TIAN L, et al. LiteDEKR: end-to-end lite 2D human pose estimation network[J]. IET Image Processing, 2023, 17(12): 3392-3400. |
[40] | ZHANG H, DUN Y, PEI Y, et al. HF-HRNet: a simple hardware friendly high-resolution network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(8): 7699-7711. |
[41] | HAN F, DAI M, CHEN X. Lightweight human pose estimation with attention mechanism[C]// Proceedings of the 8th International Conference on Image, Vision and Computing. Piscataway: IEEE, 2023: 227-230. |
[42] | LI Q, ZHANG Z, XIAO F, et al. Dite-HRNet: dynamic lightweight high-resolution network for human pose estimation[C]// Proceedings of the 31st International Joint Conference on Artificial Intelligence. California: ijcai.org, 2022: 1095-1101. |
[1] | Haiteng MENG, Xiaole ZHAO, Tianrui LI. Lightweight image super-resolution reconstruction based on asymmetric information distillation network [J]. Journal of Computer Applications, 2025, 45(2): 601-609. |
[2] | Songsen YU, Zhifan LIN, Guopeng XUE, Jianyu XU. Lightweight large-format tile defect detection algorithm based on improved YOLOv8 [J]. Journal of Computer Applications, 2025, 45(2): 647-654. |
[3] | Junying CHEN, Shijie GUO, Lingling CHEN. Lightweight human pose estimation based on decoupled attention and ghost convolution [J]. Journal of Computer Applications, 2025, 45(1): 223-233. |
[4] | Yanjun LI, Yaodong GE, Qi WANG, Weiguo ZHANG, Chen LIU. Improved KLEIN algorithm and its quantum analysis [J]. Journal of Computer Applications, 2024, 44(9): 2810-2817. |
[5] | Yongjin ZHANG, Jian XU, Mingxing ZHANG. Lightweight algorithm for impurity detection in raw cotton based on improved YOLOv7 [J]. Journal of Computer Applications, 2024, 44(7): 2271-2278. |
[6] | Xiaohui CHENG, Yuntian HUANG, Ruifang ZHANG. Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention [J]. Journal of Computer Applications, 2024, 44(6): 1927-1934. |
[7] | Xiaogang SONG, Dongdong ZHANG, Pengfei ZHANG, Li LIANG, Xinhong HEI. Real-time object detection algorithm for complex construction environments [J]. Journal of Computer Applications, 2024, 44(5): 1605-1612. |
[8] | Jun FENG, Jiankang BI, Yiru HUO, Jiakuan LI. PIPNet: lightweight asphalt pavement crack image segmentation network [J]. Journal of Computer Applications, 2024, 44(5): 1520-1526. |
[9] | Huantong GENG, Zhenyu LIU, Jun JIANG, Zichen FAN, Jiaxing LI. Embedded road crack detection algorithm based on improved YOLOv8 [J]. Journal of Computer Applications, 2024, 44(5): 1613-1618. |
[10] | Bin XIAO, Yun GAN, Min WANG, Xingpeng ZHANG, Zhaoxing WANG. Network abnormal traffic detection based on port attention and convolutional block attention module [J]. Journal of Computer Applications, 2024, 44(4): 1027-1034. |
[11] | Zijie HUANG, Yang OU, Degang JIANG, Cailing GUO, Bailin LI. Lightweight deep learning algorithm for weld seam surface quality detection of traction seat [J]. Journal of Computer Applications, 2024, 44(3): 983-988. |
[12] | Chenghanyu ZHANG, Yuzhe LIN, Chengke TAN, Junfan WANG, Yeting GU, Zhekang DONG, Mingyu GAO. New dish recognition network based on lightweight YOLOv5 [J]. Journal of Computer Applications, 2024, 44(2): 638-644. |
[13] | Yanran SHEN, Xin WEN, Jinhao ZHANG, Shuai ZHANG, Rui CAO, Baolu GAO. fMRI brain age prediction model with lightweight multi-scale convolutional network [J]. Journal of Computer Applications, 2024, 44(12): 3949-3957. |
[14] | Yong XIANG, Yanjun LI, Dingyun HUANG, Yu CHEN, Huiqin XIE. Differential and linear characteristic analysis of full-round Shadow algorithm [J]. Journal of Computer Applications, 2024, 44(12): 3839-3843. |
[15] | Ziqian CHEN, Kedi NIU, Zhongyuan YAO, Xueming SI. Review of blockchain lightweight technology applied to internet of things [J]. Journal of Computer Applications, 2024, 44(12): 3688-3698. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||