Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (4): 1293-1299.DOI: 10.11772/j.issn.1001-9081.2024040507
• Multimedia computing and computer simulation • Previous Articles Next Articles
Shiyue GUO1(), Jianwu DANG1,2, Yangping WANG1,2, Jiu YONG1,2
Received:
2024-04-25
Revised:
2024-07-17
Accepted:
2024-07-18
Online:
2025-04-08
Published:
2025-04-10
Contact:
Shiyue GUO
About author:
DANG Jianwu, born in 1963, Ph. D., professor. His research interests include intelligence information processing, artificial intelligence.Supported by:
通讯作者:
郭诗月
作者简介:
党建武(1963—),男,陕西渭南人,教授,博士,主要研究方向:智能信息处理、人工智能基金资助:
CLC Number:
Shiyue GUO, Jianwu DANG, Yangping WANG, Jiu YONG. 3D hand pose estimation combining attention mechanism and multi-scale feature fusion[J]. Journal of Computer Applications, 2025, 45(4): 1293-1299.
郭诗月, 党建武, 王阳萍, 雍玖. 结合注意力机制和多尺度特征融合的三维手部姿态估计[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1293-1299.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024040507
算法 | APh /% | MPJPE/mm | MRRPE/mm | |
---|---|---|---|---|
单手 | 双手 | |||
文献[ | 99.20 | 45.74 | 51.44 | 41.45 |
InterNet[ | 99.14 | 12.16 | 16.02 | 32.59 |
DIGIT[ | 99.15 | 11.32 | 15.57 | 30.51 |
文献[ | 98.97 | 11.10 | 15.14 | 30.92 |
文献[ | — | 10.99 | 14.34 | 29.63 |
文献[ | — | 8.51 | 13.12 | — |
文献[ | 99.02 | 9.10 | 12.82 | 31.37 |
本文算法 | 99.35 | 9.96 | 12.32 | 29.57 |
Tab. 1 Comparison results of different algorithms on InterHand2.6M dataset
算法 | APh /% | MPJPE/mm | MRRPE/mm | |
---|---|---|---|---|
单手 | 双手 | |||
文献[ | 99.20 | 45.74 | 51.44 | 41.45 |
InterNet[ | 99.14 | 12.16 | 16.02 | 32.59 |
DIGIT[ | 99.15 | 11.32 | 15.57 | 30.51 |
文献[ | 98.97 | 11.10 | 15.14 | 30.92 |
文献[ | — | 10.99 | 14.34 | 29.63 |
文献[ | — | 8.51 | 13.12 | — |
文献[ | 99.02 | 9.10 | 12.82 | 31.37 |
本文算法 | 99.35 | 9.96 | 12.32 | 29.57 |
算法 | 参数量/106 | 计算量/ GFLOPs | InterHand2.6M | |
---|---|---|---|---|
测试/min | 训练(epoch)/h | |||
文献[ | 13.61 | 30.52 | 34.45 | 4.08 |
InterNet[ | 47.31 | 23.24 | 35.36 | 3.81 |
本文算法 | 38.73 | 21.57 | 68.21 | 6.03 |
Tab. 2 Comparison of parameters and computation complexity among different algorithms
算法 | 参数量/106 | 计算量/ GFLOPs | InterHand2.6M | |
---|---|---|---|---|
测试/min | 训练(epoch)/h | |||
文献[ | 13.61 | 30.52 | 34.45 | 4.08 |
InterNet[ | 47.31 | 23.24 | 35.36 | 3.81 |
本文算法 | 38.73 | 21.57 | 68.21 | 6.03 |
算法 | GT_S | GT_H | EPE/mm |
---|---|---|---|
文献[ | 是 | 是 | 30.42 |
文献[ | 是 | 是 | 19.95 |
文献[ | 是 | 是 | 20.74 |
文献[ | 是 | 是 | 19.73 |
否 | 否 | 22.53 | |
InterNet[ | 否 | 否 | 20.89 |
QMGR-Net[ | 否 | 否 | 18.59 |
本文算法 | 否 | 否 | 18.21 |
Tab. 3 Comparison of proposed algorithm and existing algorithms on RHD dataset
算法 | GT_S | GT_H | EPE/mm |
---|---|---|---|
文献[ | 是 | 是 | 30.42 |
文献[ | 是 | 是 | 19.95 |
文献[ | 是 | 是 | 20.74 |
文献[ | 是 | 是 | 19.73 |
否 | 否 | 22.53 | |
InterNet[ | 否 | 否 | 20.89 |
QMGR-Net[ | 否 | 否 | 18.59 |
本文算法 | 否 | 否 | 18.21 |
算法 | MPJPE | MRRPE | |
---|---|---|---|
单手 | 双手 | ||
去除SEM模块 | 11.59 | 13.76 | 31.45 |
去除SS-MIFM模块 | 13.16 | 17.02 | 34.49 |
去除2.5D姿态估计模块 | 12.32 | 15.57 | 32.51 |
全部模块 | 9.96 | 12.32 | 29.57 |
Tab. 4 Ablation experimental data analysis
算法 | MPJPE | MRRPE | |
---|---|---|---|
单手 | 双手 | ||
去除SEM模块 | 11.59 | 13.76 | 31.45 |
去除SS-MIFM模块 | 13.16 | 17.02 | 34.49 |
去除2.5D姿态估计模块 | 12.32 | 15.57 | 32.51 |
全部模块 | 9.96 | 12.32 | 29.57 |
1 | ISMAIL A W, ALADIN M Y F, HALIM N A A, et al. Augmented reality using gesture and speech accelerates user interaction[C]// Proceedings of the 2022 International Conference on Advanced Communication and Intelligent Systems, CCIS 1749. Cham: Springer, 2023: 233-244. |
2 | QI T D, BOYD L, FITZPATRICK S, et al. Towards a virtual reality visualization of hand-object interactions to support remote physical therapy[C]// Proceedings of the 2023 International Conference on Ubiquitous Computing and Ambient Intelligence, LNNS 835. Cham: Springer, 2023: 136-147. |
3 | ZHOU Y, JIANG G, LIN Y. A novel finger and hand pose estimation technique for real-time hand gesture recognition[J]. Pattern Recognition, 2016, 49: 102-114. |
4 | SRIDHAR S, FEIT A M, THEOBALT C, et al. Investigating the dexterity of multi-finger input for mid-air text entry[C]// Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. New York: ACM, 2015: 3643-3652. |
5 | OBERWEGER M, LEPETIT V. DeepPrior++: improving fast and accurate 3D hand pose estimation[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE, 2017: 585-594. |
6 | CHANG J Y, MOON G, LEE K M. V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5079-5088. |
7 | MUELLER F, MEHTA D, SOTNYCHENKO O, et al. Real-time hand tracking under occlusion from an egocentric RGB-D sensor[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1163-1172. |
8 | KADKHODAMOHAMMADI A, PADOY N. A generalizable approach for multi-view 3D human pose regression[J]. Machine Vision and Applications, 2021, 32: No.6. |
9 | ISKAKOV K, BURKOV E, LEMPITSKY V, et al. Learnable triangulation of human pose[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7717-7726. |
10 | SIMON T, JOO H, MATTHEWS I, et al. Hand keypoint detection in single images using multiview bootstrapping[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4645-4653. |
11 | ZIMMERMANN C, BROX T. Learning to estimate 3D hand pose from single RGB images[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 4913-4921. |
12 | CAI Y, GE L, CAI J, et al. Weakly-supervised 3D hand pose estimation from monocular RGB images[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11210. Cham: Springer, 2018: 678-694. |
13 | IQBAL U, MOLCHANOV P, BREUEL T, et al. Hand pose estimation via latent 2.5D heatmap regression[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11215. Cham: Springer, 2018: 125-143. |
14 | SPURR A, IQBAL U, MOLCHANOV P, et al. Weakly supervised 3D hand pose estimation via biomechanical constraints[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12362. Cham: Springer, 2020: 211-228. |
15 | ZHOU Y, HABERMANN M, XU W, et al. Monocular real-time hand shape and motion capture using multi-modal data[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5345-5354. |
16 | MOON G, YU S I, WEN H, et al. InterHand2.6M: a dataset and baseline for 3D interacting hand pose estimation from a single RGB image[C]// Proceedings of the 2020 European Conference Computer Vision, LNCS 12365. Cham: Springer, 2020: 548-564. |
17 | NI H, XIE S, XU P, et al. QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation[J]. International Journal of Machine Learning and Cybernetics, 2023, 14(12): 4029-4045. |
18 | LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944. |
19 | SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5686-5696. |
20 | NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimation[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9912. Cham: Springer, 2016: 483-499. |
21 | GUPTA D, ARTACHO B, SAVAKIS A. HandyPose: multi-level framework for hand pose estimation[J]. Pattern Recognition, 2022, 128: No.108674. |
22 | GUAN X, SHEN H, NYATEGA C O, et al. Repeated cross-scale structure-induced feature fusion network for 2D hand pose estimation[J]. Entropy, 2023, 25(5): No.724. |
23 | XIAO Y, YU D, WANG X, et al. SPCNet: spatial preserve and content-aware network for human pose estimation[C]// Proceedings of the 24th European Conference on Artificial Intelligence. Amsterdam: IOS Press, 2020:2776-2783. |
24 | SCHLEMPER J, OKTAY O, SCHAAP M, et al. Attention gated networks: learning to leverage salient regions in medical images[J]. Medical Image Analysis, 2019, 53: 197-207. |
25 | FAN Z, SPURR A, KOCABAS M, et al. Learning to disambiguate strongly interacting hands via probabilistic per-pixel part segmentation[C]// Proceedings of the 2021 International Conference on 3D Vision. Piscataway: IEEE, 2021: 1-10. |
26 | 贾迪,李宇扬,安彤,等. 融合多尺度特征的复杂手势姿态估计网络[J]. 中国图象图形学报, 2023, 28(9):2887-2898. |
JIA D, LI Y Y, AN T, et al. Complex gesture pose estimation network fusing multiscale features[J]. Journal of Image and Graphics, 2023, 28(9): 2887-2898. | |
27 | HAMPALI S, SARKAR S D, RAD M, et al. Keypoint Transformer: solving joint identification in challenging hands and object interactions for accurate 3D pose estimation[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11080-11090. |
28 | MENG H, JIN S, LIU W, et al. 3D interacting hand pose estimation by hand de-occlusion and removal[C]// Proceedings of the 2022 European Conference on Computer Vision, LNCS 13666. Cham: Springer, 2022: 380-397. |
29 | GAO C, YANG Y, LI W. 3D interacting hand pose and shape estimation from a single RGB image[J]. Neurocomputing, 2022, 474: 25-36. |
30 | YANG L, YAO A. Disentangling latent hands for image synthesis and pose estimation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9869-9878. |
31 | ZHAO L, PENG X, CHEN Y, et al. Knowledge as priors: cross-modal knowledge generalization for datasets without superior knowledge[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 6527-6536. |
[1] | Jie HU, Qiyang ZHENG, Jun SUN, Yan ZHANG. Multi-label classification model based on multi-label relational graph and local dynamic reconstruction learning [J]. Journal of Computer Applications, 2025, 45(4): 1104-1112. |
[2] | Chun XU, Shuangyan JI, Huan MA, Enwei SUN, Mengmeng WANG, Mingyu SU. Consultation recommendation method based on knowledge graph and dialogue structure [J]. Journal of Computer Applications, 2025, 45(4): 1157-1168. |
[3] | Liwei ZHANG, Quan LIANG, Yutao HU, Qiaole ZHU. Channel shuffle attention mechanism based on group convolution [J]. Journal of Computer Applications, 2025, 45(4): 1069-1076. |
[4] | Kunyuan JIANG, Xiaoxia LI, Li WANG, Yaodan CAO, Xiaoqiang ZHANG, Nan DING, Yingyue ZHOU. Boundary-cross supervised semantic segmentation network with decoupled residual self-attention [J]. Journal of Computer Applications, 2025, 45(4): 1120-1129. |
[5] | Liqin WANG, Zhilei GENG, Yingshuang LI, Yongfeng DONG, Meng BIAN. Open-world knowledge reasoning model based on path and enhanced triplet text [J]. Journal of Computer Applications, 2025, 45(4): 1177-1183. |
[6] | Haijun GENG, Yun DONG, Zhiguo HU, Haotian CHI, Jing YANG, Xia YIN. Encrypted traffic classification method based on Attention-1DCNN-CE [J]. Journal of Computer Applications, 2025, 45(3): 872-882. |
[7] | Dixin WANG, Jiahao WANG, Min LI, Hao CHEN, Guangyao HU, Yu GONG. Abnormal attack detection for underwater acoustic communication network [J]. Journal of Computer Applications, 2025, 45(2): 526-533. |
[8] | Zhongwei ZHANG, Jun WANG, Shudong LIU, Zhiheng WANG. Object detection in remote sensing image based on multi-scale feature fusion and weighted boxes fusion [J]. Journal of Computer Applications, 2025, 45(2): 633-639. |
[9] | Haiteng MENG, Xiaole ZHAO, Tianrui LI. Lightweight image super-resolution reconstruction based on asymmetric information distillation network [J]. Journal of Computer Applications, 2025, 45(2): 601-609. |
[10] | Tianqi ZHANG, Shuang TAN, Xiwen SHEN, Juan TANG. Image watermarking method combining attention mechanism and multi-scale feature [J]. Journal of Computer Applications, 2025, 45(2): 616-623. |
[11] | Qijian CAI, Wei TAN. Semantic graph enhanced multi-modal recommendation algorithm [J]. Journal of Computer Applications, 2025, 45(2): 421-427. |
[12] | Yan LI, Guanhua YE, Yawen LI, Meiyu LIANG. Enterprise ESG indicator prediction model based on richness coordination technology [J]. Journal of Computer Applications, 2025, 45(2): 670-676. |
[13] | Lifang WANG, Jingshuang WU, Pengliang YIN, Lihua HU. Action recognition algorithm based on attention mechanism and energy function [J]. Journal of Computer Applications, 2025, 45(1): 234-239. |
[14] | Jie XU, Yong ZHONG, Yang WANG, Changfu ZHANG, Guanci YANG. Facial attribute estimation and expression recognition based on contextual channel attention mechanism [J]. Journal of Computer Applications, 2025, 45(1): 253-260. |
[15] | Junying CHEN, Shijie GUO, Lingling CHEN. Lightweight human pose estimation based on decoupled attention and ghost convolution [J]. Journal of Computer Applications, 2025, 45(1): 223-233. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||