Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (12): 4012-4020.DOI: 10.11772/j.issn.1001-9081.2024111715
• Multimedia computing and computer simulation • Previous Articles Next Articles
Jianhua REN1, Jiahui CAO1, Di JIA1,2
Received:2024-12-05
Revised:2025-04-10
Accepted:2025-04-15
Online:2025-04-22
Published:2025-12-10
Contact:
Jiahui CAO
About author:REN Jianhua, born in 1973, M. S., associate professor. His research interests include data mining, machine learning, image processing.Supported by:任建华1, 曹佳惠1, 贾迪1,2
通讯作者:
曹佳惠
作者简介:任建华(1973—),男,辽宁沈阳人,副教授,硕士,主要研究方向:数据挖掘、机器学习、图像处理基金资助:CLC Number:
Jianhua REN, Jiahui CAO, Di JIA. Hand pose estimation based on mask prompts and attention[J]. Journal of Computer Applications, 2025, 45(12): 4012-4020.
任建华, 曹佳惠, 贾迪. 基于掩码提示和注意力的手部姿态估计[J]. 《计算机应用》唯一官方网站, 2025, 45(12): 4012-4020.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024111715
| 方法 | AUC/% | MPJPE/px | Speed/ ms | ||
|---|---|---|---|---|---|
| ALL | Multi-Hand | ALL | Multi-Hand | ||
| LightWeightHand | 86.25 | 62.13 | 2.97 | 3.17 | 38 |
| InterNet | 88.05 | 62.32 | 2.77 | 3.16 | 42 |
| SRHandNet | 91.26 | 62.54 | 2.30 | 3.13 | 33 |
| UDA-PE | 92.68 | 64.52 | 2.22 | 3.05 | 32 |
| ZoomNAS | 92.79 | 64.48 | 2.19 | 3.02 | 31 |
| UVP | 92.01 | 63.72 | 2.28 | 3.11 | 56 |
| DMFusion | 92.08 | 63.95 | 2.24 | 3.07 | 47 |
| HMCA | 93.22 | 66.21 | 2.19 | 2.95 | 90 |
Tab. 1 Comparison experimental results on RHD
| 方法 | AUC/% | MPJPE/px | Speed/ ms | ||
|---|---|---|---|---|---|
| ALL | Multi-Hand | ALL | Multi-Hand | ||
| LightWeightHand | 86.25 | 62.13 | 2.97 | 3.17 | 38 |
| InterNet | 88.05 | 62.32 | 2.77 | 3.16 | 42 |
| SRHandNet | 91.26 | 62.54 | 2.30 | 3.13 | 33 |
| UDA-PE | 92.68 | 64.52 | 2.22 | 3.05 | 32 |
| ZoomNAS | 92.79 | 64.48 | 2.19 | 3.02 | 31 |
| UVP | 92.01 | 63.72 | 2.28 | 3.11 | 56 |
| DMFusion | 92.08 | 63.95 | 2.24 | 3.07 | 47 |
| HMCA | 93.22 | 66.21 | 2.19 | 2.95 | 90 |
| 方法 | AUC/% | MPJPE/px | ||
|---|---|---|---|---|
| ALL | Multi-Hand | ALL | Multi-Hand | |
| LightWeightHand | 85.26 | 60.23 | 2.86 | 3.15 |
| InterNet | 86.98 | 60.32 | 2.65 | 3.12 |
| SRHandNet | 88.25 | 60.65 | 2.38 | 3.10 |
| UDA-PE | 90.05 | 61.62 | 2.16 | 3.03 |
| ZoomNAS | 90.19 | 61.96 | 2.13 | 2.98 |
| UVP | 89.05 | 60.95 | 2.22 | 3.08 |
| DMFusion | 89.53 | 61.13 | 2.20 | 3.05 |
| HMCA | 91.38 | 62.68 | 2.06 | 2.90 |
Tab. 2 Comparison experimental results on CMU Panoptic dataset
| 方法 | AUC/% | MPJPE/px | ||
|---|---|---|---|---|
| ALL | Multi-Hand | ALL | Multi-Hand | |
| LightWeightHand | 85.26 | 60.23 | 2.86 | 3.15 |
| InterNet | 86.98 | 60.32 | 2.65 | 3.12 |
| SRHandNet | 88.25 | 60.65 | 2.38 | 3.10 |
| UDA-PE | 90.05 | 61.62 | 2.16 | 3.03 |
| ZoomNAS | 90.19 | 61.96 | 2.13 | 2.98 |
| UVP | 89.05 | 60.95 | 2.22 | 3.08 |
| DMFusion | 89.53 | 61.13 | 2.20 | 3.05 |
| HMCA | 91.38 | 62.68 | 2.06 | 2.90 |
| 损失函数 | AUC/% | MPJPE/px |
|---|---|---|
| MSE | 89.20 | 2.25 |
| MSE+Mask | 91.38 | 2.06 |
Tab.3 Loss function ablation experimental results on CMU Panoptic dataset
| 损失函数 | AUC/% | MPJPE/px |
|---|---|---|
| MSE | 89.20 | 2.25 |
| MSE+Mask | 91.38 | 2.06 |
| 模块 | AUC/% | MPJPE/px |
|---|---|---|
| 不使用注意力 | 70.18 | 4.66 |
| MAB | 81.56 | 3.85 |
| MSB | 85.98 | 3.53 |
| PAB | 91.38 | 2.06 |
Tab. 4 Attention module ablation experimental results on CMU Panoptic dataset
| 模块 | AUC/% | MPJPE/px |
|---|---|---|
| 不使用注意力 | 70.18 | 4.66 |
| MAB | 81.56 | 3.85 |
| MSB | 85.98 | 3.53 |
| PAB | 91.38 | 2.06 |
| 模型 | AUC/% | MPJPE/px |
|---|---|---|
| 单一路径 | 90.65 | 2.18 |
| 双路 | 90.88 | 2.10 |
| 三路 | 91.38 | 2.06 |
| 四路 | 91.25 | 2.09 |
Tab. 5 MRB ablation experimental results on CMU Panoptic dataset
| 模型 | AUC/% | MPJPE/px |
|---|---|---|
| 单一路径 | 90.65 | 2.18 |
| 双路 | 90.88 | 2.10 |
| 三路 | 91.38 | 2.06 |
| 四路 | 91.25 | 2.09 |
| 模型 | AUC/% | MPJPE/px |
|---|---|---|
| 模型 1 | 89.28 | 2.32 |
| 模型 2 | 91.38 | 2.06 |
Tab. 6 Results of Mask attention ablation experiments on CMU Panoptic dataset
| 模型 | AUC/% | MPJPE/px |
|---|---|---|
| 模型 1 | 89.28 | 2.32 |
| 模型 2 | 91.38 | 2.06 |
| [1] | 付智凯,李文新,罗新奎. 基于视觉的动态手势识别技术综述[J]. 计算机测量与控制, 2025, 33(1): 9-19. |
| FU Z K, LI W X, LUO X K. Review of vision-based dynamic gesture recognition techniques[J]. Computer Measurement and Control, 2025, 33(1): 9-19. | |
| [2] | ZHENG C, WU W, CHEN C, et al. Deep learning-based human pose estimation: a survey[J]. ACM Computing Surveys, 2024, 56(1): No.11. |
| [3] | 孙文轩.复杂背景下基于注意力机制的静态手势识别方法研究[D].杭州:浙江理工大学, 2023. |
| SUN W X. Research on static gesture recognition method based on attention mechanism in complex background [D] Hangzhou: Zhejiang University of Technology, 2023. | |
| [4] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2024-10-01].. |
| [5] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. |
| [6] | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. |
| [7] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
| [8] | MOON G, YU S I, WEN H, et al. InterHand2.6M: a dataset and baseline for 3D interacting hand pose estimation from a single RGB image[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12365. Cham : Springer, 2020: 548-564. |
| [9] | ZIMMERMANN C, BROX T. Learning to estimate 3D hand pose from single RGB images[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 4913-4921. |
| [10] | KONG D, MA H, XIE X. SIA-GCN: a spatial information aware graph neural network with 2D convolutions for hand pose estimation[EB/OL]. [2023-10-01].. |
| [11] | KONG D, MA H, CHEN Y, et al. Rotation-invariant mixed graphical model network for 2D hand pose estimation[C]// Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2020: 1535-1544. |
| [12] | WANG Y, ZHANG B, PENG C. SRHandNet: real-time 2D hand pose estimation with simultaneous region localization[J]. IEEE Transactions on Image Processing, 2020, 29: 2977-2986. |
| [13] | REN P, SUN H, QI Q, et al. SRN: stacked regression network for real-time 3D hand pose estimation[C]// Proceedings of the 2019 British Machine Vision Conference. Durham: BMVA Press, 2019: No.112. |
| [14] | LI M, AN L, ZHANG H, et al. Interacting attention graph for single image two-hand reconstruction[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 2751-2760. |
| [15] | HUANG W, REN P, WANG J, et al. AWR: adaptive weighting regression for 3D hand pose estimation[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 11061-11068. |
| [16] | 赵鸿图,李豪,梁梦华. 复杂背景下多特征结合的深度学习手势识别[J]. 电子测量技术, 2023, 46(23): 77-84. |
| ZHAO H T, LI H, LIANG M H. Deep learning gesture recognition based on multi-feature combination in complex background[J]. Electronic Measurement Technology, 2023, 46(23): 77-84. | |
| [17] | MAKRIS A, KYRIAZIS N, ARGYROS A A. Hierarchical particle filtering for 3D hand tracking[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2015: 8-17. |
| [18] | 贾迪,李宇扬,安彤,等. 融合多尺度特征的复杂手势姿态估计网络[J]. 中国图象图形学报, 2023, 28(9): 2887-2898. |
| JIA D, LI Y Y, AN T, et al. Complex gesture pose estimation network fusing multiscale features[J]. Journal of Image and Graphics, 2023, 28(9): 2887-2898. | |
| [19] | 赵君. 基于关键点检测与实例分割的手势交互算法研究[D]. 包头: 内蒙古科技大学, 2020. |
| ZHAO J. Research on gesture interaction algorithm based on keypoint detection and instance segmentation[D]. Baotou: Inner Mongolia University of Science and Technology, 2020. | |
| [20] | 王燕,南佩奇. MFFNet:多级特征融合图像语义分割网络[J]. 计算机科学与探索, 2024, 18(3): 707-717. |
| WANG Y, NAN P Q. MFFNet: image semantic segmentation network of multi-level feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 707-717. | |
| [21] | 祁欣,袁非牛,史劲亭,等. 多层次特征融合网络的语义分割算法[J]. 计算机科学与探索, 2023, 17(4): 922-932. |
| QI X, YUAN F N, SHI J T, et al. Semantic segmentation algorithm of multi-level feature fusion network[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(4): 922-932. | |
| [22] | MA J, HE Y, LI F, et al. Segment anything in medical images[J]. Nature Communications, 2024, 15: No.654. |
| [23] | HAN K, WANG Y, CHEN H, et al. A survey on vision Transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 87-110. |
| [24] | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. |
| [25] | HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13708-13717. |
| [26] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. [2023-10-01].. |
| [27] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
| [28] | SIMON T, JOO H, MATTHEWS I, et al. Hand keypoint detection in single images using multiview bootstrapping[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4645-4653. |
| [29] | XIN J, CHA M, SHI L, et al. Lightweight convolutional neural network of YOLOv3-tiny algorithm on FPGA for target detection[C]// Proceedings of the 2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering. Piscataway: IEEE, 2021: 65-70. |
| [30] | KIM D, WANG K, SAENKO K, et al. A unified framework for domain adaptive pose estimation[C]// Proceedings of the 2022 European Conference on Computer Vision, LNCS 13693. Cham: Springer, 2022: 603-620. |
| [31] | XU L, JIN S, LIU W, et al. ZoomNAS: searching for whole-body human pose estimation in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 5296-5313. |
| [32] | 曹忠锐,谢文军,王冬,等. 基于视角统一的手姿态估计优化方法[J]. 计算机应用研究, 2025, 42(1): 293-299. |
| CAO Z R, XIE W J, WANG D, et al. Optimization method of hand pose estimation based on unified view[J]. Application Research of Computers, 2025, 42(1): 293-299. | |
| [33] | 陈征,李晋江. 基于多尺度特征融合的双分支手部姿态估计算法[J]. 计算机工程与设计, 2024, 45(10): 3059-3065. |
| CHEN Z, LI J J. Multi-scale feature fusion based dual branch algorithm for hand pose estimation[J]. Computer Engineering and Design, 2024, 45(10): 3059-3065. |
| [1] | Weigang LI, Jiale SHAO, Zhiqiang TIAN. Point cloud classification and segmentation network based on dual attention mechanism and multi-scale fusion [J]. Journal of Computer Applications, 2025, 45(9): 3003-3010. |
| [2] | Xiang WANG, Zhixiang CHEN, Guojun MAO. Multivariate time series prediction method combining local and global correlation [J]. Journal of Computer Applications, 2025, 45(9): 2806-2816. |
| [3] | Jinggang LYU, Shaorui PENG, Shuo GAO, Jin ZHOU. Speech enhancement network driven by complex frequency attention and multi-scale frequency enhancement [J]. Journal of Computer Applications, 2025, 45(9): 2957-2965. |
| [4] | Chao SHI, Yuxin ZHOU, Qian FU, Wanyu TANG, Ling HE, Yuanyuan LI. Action recognition algorithm for ADHD patients using skeleton and 3D heatmap [J]. Journal of Computer Applications, 2025, 45(9): 3036-3044. |
| [5] | Hongjun ZHANG, Gaojun PAN, Hao YE, Yubin LU, Yiheng MIAO. Multi-source heterogeneous data analysis method combining deep learning and tensor decomposition [J]. Journal of Computer Applications, 2025, 45(9): 2838-2847. |
| [6] | Haifeng WU, Liqing TAO, Yusheng CHENG. Partial label regression algorithm integrating feature attention and residual connection [J]. Journal of Computer Applications, 2025, 45(8): 2530-2536. |
| [7] | Peng PENG, Ziting CAI, Wenling LIU, Caihua CHEN, Wei ZENG, Baolai HUANG. Speech emotion recognition method based on hybrid Siamese network with CNN and bidirectional GRU [J]. Journal of Computer Applications, 2025, 45(8): 2515-2521. |
| [8] | Chao JING, Yutao QUAN, Yan CHEN. Improved multi-layer perceptron and attention model-based power consumption prediction algorithm [J]. Journal of Computer Applications, 2025, 45(8): 2646-2655. |
| [9] | Jinhao LIN, Chuan LUO, Tianrui LI, Hongmei CHEN. Thoracic disease classification method based on cross-scale attention network [J]. Journal of Computer Applications, 2025, 45(8): 2712-2719. |
| [10] | Jin ZHOU, Yuzhi LI, Xu ZHANG, Shuo GAO, Li ZHANG, Jiachuan SHENG. Modulation recognition network for complex electromagnetic environments [J]. Journal of Computer Applications, 2025, 45(8): 2672-2682. |
| [11] | Yihan WANG, Chong LU, Zhongyuan CHEN. Multimodal sentiment analysis model with cross-modal text information enhancement [J]. Journal of Computer Applications, 2025, 45(7): 2237-2244. |
| [12] | Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU. Source code vulnerability detection method based on Transformer-GCN [J]. Journal of Computer Applications, 2025, 45(7): 2296-2303. |
| [13] | Yongpeng TAO, Shiqi BAI, Zhengwen ZHOU. Neural architecture search for multi-tissue segmentation using convolutional and transformer-based networks in glioma segmentation [J]. Journal of Computer Applications, 2025, 45(7): 2378-2386. |
| [14] | Haoyu LIU, Pengwei KONG, Yaoli WANG, Qing CHANG. Pedestrian detection algorithm based on multi-view information [J]. Journal of Computer Applications, 2025, 45(7): 2325-2332. |
| [15] | Xiaoqiang ZHAO, Yongyong LIU, Yongyong HUI, Kai LIU. Batch process quality prediction model using improved time-domain convolutional network with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2025, 45(7): 2245-2252. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||