《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (12): 4012-4020.DOI: 10.11772/j.issn.1001-9081.2024111715
任建华1, 曹佳惠1, 贾迪1,2
收稿日期:2024-12-05
修回日期:2025-04-10
接受日期:2025-04-15
发布日期:2025-04-22
出版日期:2025-12-10
通讯作者:
曹佳惠
作者简介:任建华(1973—),男,辽宁沈阳人,副教授,硕士,主要研究方向:数据挖掘、机器学习、图像处理基金资助:Jianhua REN1, Jiahui CAO1, Di JIA1,2
Received:2024-12-05
Revised:2025-04-10
Accepted:2025-04-15
Online:2025-04-22
Published:2025-12-10
Contact:
Jiahui CAO
About author:REN Jianhua, born in 1973, M. S., associate professor. His research interests include data mining, machine learning, image processing.Supported by:摘要:
手部姿态估计是计算机视觉的重要研究方向,传统方法易受复杂背景干扰,而深度学习方法虽具抗干扰能力,但在多手场景和细节识别方面仍存不足。因此,提出一种基于掩码提示和注意力机制的手部姿态估计方法HMCA(Hand Mask Prompts and Attention)。首先,利用目标检测和语义分割生成手部掩码图,从而屏蔽背景噪声并提供先验信息;其次,设计并行注意力模块(PAB)与多路残差模块(MRB),以提取多尺度特征,从而提高复杂手势识别能力,降低计算复杂度,并防止梯度消失;再次,利用掩码图引导模型关注手部区域,从而解决多手和遮挡问题;最后,在回归损失中加入惩罚项,从而约束关键点预测并加快模型收敛。实验结果表明,该方法在单手、多手和遮挡场景下均优于其他方法,在不同阈值下的曲线面积均值(AUC)和平均关节点位置误差(MPJPE)方面均取得最佳性能。在RHD(Rendered Handpose Dataset)上,该方法在不同阈值下的AUC为93.22%,MPJPE为2.15;在CMU Panoptic数据集上,该方法在不同阈值下的AUC为91.38%,手部关节点平均误差为2.06。
中图分类号:
任建华, 曹佳惠, 贾迪. 基于掩码提示和注意力的手部姿态估计[J]. 计算机应用, 2025, 45(12): 4012-4020.
Jianhua REN, Jiahui CAO, Di JIA. Hand pose estimation based on mask prompts and attention[J]. Journal of Computer Applications, 2025, 45(12): 4012-4020.
| 方法 | AUC/% | MPJPE/px | Speed/ ms | ||
|---|---|---|---|---|---|
| ALL | Multi-Hand | ALL | Multi-Hand | ||
| LightWeightHand | 86.25 | 62.13 | 2.97 | 3.17 | 38 |
| InterNet | 88.05 | 62.32 | 2.77 | 3.16 | 42 |
| SRHandNet | 91.26 | 62.54 | 2.30 | 3.13 | 33 |
| UDA-PE | 92.68 | 64.52 | 2.22 | 3.05 | 32 |
| ZoomNAS | 92.79 | 64.48 | 2.19 | 3.02 | 31 |
| UVP | 92.01 | 63.72 | 2.28 | 3.11 | 56 |
| DMFusion | 92.08 | 63.95 | 2.24 | 3.07 | 47 |
| HMCA | 93.22 | 66.21 | 2.19 | 2.95 | 90 |
表1 RHD数据集上的对比实验结果
Tab. 1 Comparison experimental results on RHD
| 方法 | AUC/% | MPJPE/px | Speed/ ms | ||
|---|---|---|---|---|---|
| ALL | Multi-Hand | ALL | Multi-Hand | ||
| LightWeightHand | 86.25 | 62.13 | 2.97 | 3.17 | 38 |
| InterNet | 88.05 | 62.32 | 2.77 | 3.16 | 42 |
| SRHandNet | 91.26 | 62.54 | 2.30 | 3.13 | 33 |
| UDA-PE | 92.68 | 64.52 | 2.22 | 3.05 | 32 |
| ZoomNAS | 92.79 | 64.48 | 2.19 | 3.02 | 31 |
| UVP | 92.01 | 63.72 | 2.28 | 3.11 | 56 |
| DMFusion | 92.08 | 63.95 | 2.24 | 3.07 | 47 |
| HMCA | 93.22 | 66.21 | 2.19 | 2.95 | 90 |
| 方法 | AUC/% | MPJPE/px | ||
|---|---|---|---|---|
| ALL | Multi-Hand | ALL | Multi-Hand | |
| LightWeightHand | 85.26 | 60.23 | 2.86 | 3.15 |
| InterNet | 86.98 | 60.32 | 2.65 | 3.12 |
| SRHandNet | 88.25 | 60.65 | 2.38 | 3.10 |
| UDA-PE | 90.05 | 61.62 | 2.16 | 3.03 |
| ZoomNAS | 90.19 | 61.96 | 2.13 | 2.98 |
| UVP | 89.05 | 60.95 | 2.22 | 3.08 |
| DMFusion | 89.53 | 61.13 | 2.20 | 3.05 |
| HMCA | 91.38 | 62.68 | 2.06 | 2.90 |
表2 CMU Panoptic数据集上的对比实验结果
Tab. 2 Comparison experimental results on CMU Panoptic dataset
| 方法 | AUC/% | MPJPE/px | ||
|---|---|---|---|---|
| ALL | Multi-Hand | ALL | Multi-Hand | |
| LightWeightHand | 85.26 | 60.23 | 2.86 | 3.15 |
| InterNet | 86.98 | 60.32 | 2.65 | 3.12 |
| SRHandNet | 88.25 | 60.65 | 2.38 | 3.10 |
| UDA-PE | 90.05 | 61.62 | 2.16 | 3.03 |
| ZoomNAS | 90.19 | 61.96 | 2.13 | 2.98 |
| UVP | 89.05 | 60.95 | 2.22 | 3.08 |
| DMFusion | 89.53 | 61.13 | 2.20 | 3.05 |
| HMCA | 91.38 | 62.68 | 2.06 | 2.90 |
| 损失函数 | AUC/% | MPJPE/px |
|---|---|---|
| MSE | 89.20 | 2.25 |
| MSE+Mask | 91.38 | 2.06 |
表3 CMU Panoptic数据集上的损失函数消融实验结果
Tab.3 Loss function ablation experimental results on CMU Panoptic dataset
| 损失函数 | AUC/% | MPJPE/px |
|---|---|---|
| MSE | 89.20 | 2.25 |
| MSE+Mask | 91.38 | 2.06 |
| 模块 | AUC/% | MPJPE/px |
|---|---|---|
| 不使用注意力 | 70.18 | 4.66 |
| MAB | 81.56 | 3.85 |
| MSB | 85.98 | 3.53 |
| PAB | 91.38 | 2.06 |
表4 CMU Panoptic数据集上的注意力模块消融实验结果
Tab. 4 Attention module ablation experimental results on CMU Panoptic dataset
| 模块 | AUC/% | MPJPE/px |
|---|---|---|
| 不使用注意力 | 70.18 | 4.66 |
| MAB | 81.56 | 3.85 |
| MSB | 85.98 | 3.53 |
| PAB | 91.38 | 2.06 |
| 模型 | AUC/% | MPJPE/px |
|---|---|---|
| 单一路径 | 90.65 | 2.18 |
| 双路 | 90.88 | 2.10 |
| 三路 | 91.38 | 2.06 |
| 四路 | 91.25 | 2.09 |
表5 CMU Panoptic数据集上的MRB消融实验结果
Tab. 5 MRB ablation experimental results on CMU Panoptic dataset
| 模型 | AUC/% | MPJPE/px |
|---|---|---|
| 单一路径 | 90.65 | 2.18 |
| 双路 | 90.88 | 2.10 |
| 三路 | 91.38 | 2.06 |
| 四路 | 91.25 | 2.09 |
| 模型 | AUC/% | MPJPE/px |
|---|---|---|
| 模型 1 | 89.28 | 2.32 |
| 模型 2 | 91.38 | 2.06 |
表6 CMU Panoptic数据集上的Mask注意力消融实验结果
Tab. 6 Results of Mask attention ablation experiments on CMU Panoptic dataset
| 模型 | AUC/% | MPJPE/px |
|---|---|---|
| 模型 1 | 89.28 | 2.32 |
| 模型 2 | 91.38 | 2.06 |
| [1] | 付智凯,李文新,罗新奎. 基于视觉的动态手势识别技术综述[J]. 计算机测量与控制, 2025, 33(1): 9-19. |
| FU Z K, LI W X, LUO X K. Review of vision-based dynamic gesture recognition techniques[J]. Computer Measurement and Control, 2025, 33(1): 9-19. | |
| [2] | ZHENG C, WU W, CHEN C, et al. Deep learning-based human pose estimation: a survey[J]. ACM Computing Surveys, 2024, 56(1): No.11. |
| [3] | 孙文轩.复杂背景下基于注意力机制的静态手势识别方法研究[D].杭州:浙江理工大学, 2023. |
| SUN W X. Research on static gesture recognition method based on attention mechanism in complex background [D] Hangzhou: Zhejiang University of Technology, 2023. | |
| [4] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2024-10-01].. |
| [5] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. |
| [6] | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. |
| [7] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
| [8] | MOON G, YU S I, WEN H, et al. InterHand2.6M: a dataset and baseline for 3D interacting hand pose estimation from a single RGB image[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12365. Cham : Springer, 2020: 548-564. |
| [9] | ZIMMERMANN C, BROX T. Learning to estimate 3D hand pose from single RGB images[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 4913-4921. |
| [10] | KONG D, MA H, XIE X. SIA-GCN: a spatial information aware graph neural network with 2D convolutions for hand pose estimation[EB/OL]. [2023-10-01].. |
| [11] | KONG D, MA H, CHEN Y, et al. Rotation-invariant mixed graphical model network for 2D hand pose estimation[C]// Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2020: 1535-1544. |
| [12] | WANG Y, ZHANG B, PENG C. SRHandNet: real-time 2D hand pose estimation with simultaneous region localization[J]. IEEE Transactions on Image Processing, 2020, 29: 2977-2986. |
| [13] | REN P, SUN H, QI Q, et al. SRN: stacked regression network for real-time 3D hand pose estimation[C]// Proceedings of the 2019 British Machine Vision Conference. Durham: BMVA Press, 2019: No.112. |
| [14] | LI M, AN L, ZHANG H, et al. Interacting attention graph for single image two-hand reconstruction[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 2751-2760. |
| [15] | HUANG W, REN P, WANG J, et al. AWR: adaptive weighting regression for 3D hand pose estimation[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 11061-11068. |
| [16] | 赵鸿图,李豪,梁梦华. 复杂背景下多特征结合的深度学习手势识别[J]. 电子测量技术, 2023, 46(23): 77-84. |
| ZHAO H T, LI H, LIANG M H. Deep learning gesture recognition based on multi-feature combination in complex background[J]. Electronic Measurement Technology, 2023, 46(23): 77-84. | |
| [17] | MAKRIS A, KYRIAZIS N, ARGYROS A A. Hierarchical particle filtering for 3D hand tracking[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2015: 8-17. |
| [18] | 贾迪,李宇扬,安彤,等. 融合多尺度特征的复杂手势姿态估计网络[J]. 中国图象图形学报, 2023, 28(9): 2887-2898. |
| JIA D, LI Y Y, AN T, et al. Complex gesture pose estimation network fusing multiscale features[J]. Journal of Image and Graphics, 2023, 28(9): 2887-2898. | |
| [19] | 赵君. 基于关键点检测与实例分割的手势交互算法研究[D]. 包头: 内蒙古科技大学, 2020. |
| ZHAO J. Research on gesture interaction algorithm based on keypoint detection and instance segmentation[D]. Baotou: Inner Mongolia University of Science and Technology, 2020. | |
| [20] | 王燕,南佩奇. MFFNet:多级特征融合图像语义分割网络[J]. 计算机科学与探索, 2024, 18(3): 707-717. |
| WANG Y, NAN P Q. MFFNet: image semantic segmentation network of multi-level feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 707-717. | |
| [21] | 祁欣,袁非牛,史劲亭,等. 多层次特征融合网络的语义分割算法[J]. 计算机科学与探索, 2023, 17(4): 922-932. |
| QI X, YUAN F N, SHI J T, et al. Semantic segmentation algorithm of multi-level feature fusion network[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(4): 922-932. | |
| [22] | MA J, HE Y, LI F, et al. Segment anything in medical images[J]. Nature Communications, 2024, 15: No.654. |
| [23] | HAN K, WANG Y, CHEN H, et al. A survey on vision Transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 87-110. |
| [24] | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. |
| [25] | HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13708-13717. |
| [26] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. [2023-10-01].. |
| [27] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
| [28] | SIMON T, JOO H, MATTHEWS I, et al. Hand keypoint detection in single images using multiview bootstrapping[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4645-4653. |
| [29] | XIN J, CHA M, SHI L, et al. Lightweight convolutional neural network of YOLOv3-tiny algorithm on FPGA for target detection[C]// Proceedings of the 2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering. Piscataway: IEEE, 2021: 65-70. |
| [30] | KIM D, WANG K, SAENKO K, et al. A unified framework for domain adaptive pose estimation[C]// Proceedings of the 2022 European Conference on Computer Vision, LNCS 13693. Cham: Springer, 2022: 603-620. |
| [31] | XU L, JIN S, LIU W, et al. ZoomNAS: searching for whole-body human pose estimation in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 5296-5313. |
| [32] | 曹忠锐,谢文军,王冬,等. 基于视角统一的手姿态估计优化方法[J]. 计算机应用研究, 2025, 42(1): 293-299. |
| CAO Z R, XIE W J, WANG D, et al. Optimization method of hand pose estimation based on unified view[J]. Application Research of Computers, 2025, 42(1): 293-299. | |
| [33] | 陈征,李晋江. 基于多尺度特征融合的双分支手部姿态估计算法[J]. 计算机工程与设计, 2024, 45(10): 3059-3065. |
| CHEN Z, LI J J. Multi-scale feature fusion based dual branch algorithm for hand pose estimation[J]. Computer Engineering and Design, 2024, 45(10): 3059-3065. |
| [1] | 邓伊琳, 余发江. 基于LSTM和可分离自注意力机制的伪随机数生成器[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2893-2901. |
| [2] | 吕景刚, 彭绍睿, 高硕, 周金. 复频域注意力和多尺度频域增强驱动的语音增强网络[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2957-2965. |
| [3] | 石超, 周昱昕, 扶倩, 唐万宇, 何凌, 李元媛. 基于骨架和3D热图的注意缺陷多动障碍患者动作识别算法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 3036-3044. |
| [4] | 李维刚, 邵佳乐, 田志强. 基于双注意力机制和多尺度融合的点云分类与分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 3003-3010. |
| [5] | 王翔, 陈志祥, 毛国君. 融合局部和全局相关性的多变量时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2806-2816. |
| [6] | 张宏俊, 潘高军, 叶昊, 陆玉彬, 缪宜恒. 结合深度学习和张量分解的多源异构数据分析方法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2838-2847. |
| [7] | 吴海峰, 陶丽青, 程玉胜. 集成特征注意力和残差连接的偏标签回归算法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2530-2536. |
| [8] | 周金, 李玉芝, 张徐, 高硕, 张立, 盛家川. 复杂电磁环境下的调制识别网络[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2672-2682. |
| [9] | 彭鹏, 蔡子婷, 刘雯玲, 陈才华, 曾维, 黄宝来. 基于CNN和双向GRU混合孪生网络的语音情感识别方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2515-2521. |
| [10] | 敬超, 全育涛, 陈艳. 基于多层感知机-注意力模型的功耗预测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2646-2655. |
| [11] | 林进浩, 罗川, 李天瑞, 陈红梅. 基于跨尺度注意力网络的胸部疾病分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2712-2719. |
| [12] | 梁辰, 王奕森, 魏强, 杜江. 基于Transformer-GCN的源代码漏洞检测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2296-2303. |
| [13] | 陶永鹏, 柏诗淇, 周正文. 基于卷积和Transformer神经网络架构搜索的脑胶质瘤多组织分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2378-2386. |
| [14] | 王艺涵, 路翀, 陈忠源. 跨模态文本信息增强的多模态情感分析模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2237-2244. |
| [15] | 刘皓宇, 孔鹏伟, 王耀力, 常青. 基于多视角信息的行人检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2325-2332. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||