Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (12): 4012-4020.DOI: 10.11772/j.issn.1001-9081.2024111715

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Hand pose estimation based on mask prompts and attention

Jianhua REN1, Jiahui CAO1, Di JIA1,2   

  1. 1.School of Electronic and Information Engineering,Liaoning Technical University,Huludao Liaoning 125105,China
    2.Ordos Research Institute,Liaoning Technical University,Ordos Inner Mongolia 017004,China
  • Received:2024-12-05 Revised:2025-04-10 Accepted:2025-04-15 Online:2025-04-22 Published:2025-12-10
  • Contact: Jiahui CAO
  • About author:REN Jianhua, born in 1973, M. S., associate professor. His research interests include data mining, machine learning, image processing.
    CAO Jiahui, born in 2000, M. S. candidate. Her research interests include image processing, pose estimation.
    JIA Di, born in 1982, Ph. D., professor. His research interests include computer vision, pose estimation.
  • Supported by:
    National Natural Science Foundation of China(61601213);Science and Technology Cooperation Cultivation Project of Ordos Research Institute of Liaoning Technical University(YJY-XD-2023-003)

基于掩码提示和注意力的手部姿态估计

任建华1, 曹佳惠1, 贾迪1,2   

  1. 1.辽宁工程技术大学 电子与信息工程学院,辽宁 葫芦岛 125105
    2.辽宁工程技术大学 鄂尔多斯研究院,内蒙古 鄂尔多斯 017004
  • 通讯作者: 曹佳惠
  • 作者简介:任建华(1973—),男,辽宁沈阳人,副教授,硕士,主要研究方向:数据挖掘、机器学习、图像处理
    曹佳惠(2000—),女,辽宁东港人,硕士研究生,主要研究方向:图像处理、姿态估计
    贾迪(1982—),男,辽宁沈阳人,教授,博士生导师,博士,主要研究方向:计算机视觉、姿态估计。
  • 基金资助:
    国家自然科学基金资助项目(61601213);辽宁工程技术大学鄂尔多斯研究院校地科技合作培育项目(YJY-XD-2023-003)

Abstract:

Hand pose estimation is an important research direction in computer vision. Traditional methods are susceptible to complex background interference, while deep learning methods, despite being more robust, still face difficulties in multi-hand scenarios and fine-grained detail recognition. Therefore, a hand pose estimation method based on mask prompts and attention mechanisms, named HMCA(Hand Mask Prompts and Attention), was proposed. Firstly, hand mask maps, generated via object detection and semantic segmentation, were used to suppress background noise and provide prior information. Secondly, a Parallel Attention Block (PAB) and a Multi-path Residual Block (MRB) were designed to extract multi-scale features, thereby enhancing complex hand pose recognition ability, reducing computational complexity, and preventing gradient vanishing. Thirdly, the hand mask maps were utilized to guide the model to focus on hand regions, thereby addressing issues such as multi-hand and occlusion. Finally, a penalty term was incorporated into the regression loss to constrain keypoint prediction and accelerate model convergence. Experimental results show that the proposed method outperforms other methods with best performance on both the Area Under the Curve (AUC) and the Mean Per Joint Position Error (MPJPE) under varying thresholds in single-hand, multi-hand, and occlusion scenarios. On the RHD (Rendered Handpose Dataset), an AUC of 93.22% and a MPJPE of 2.15 are achieved under varying thresholds; on the CMU Panoptic dataset, an AUC of 91.38% and a mean hand keypoint error of 2.06 are reported under varying thresholds.

Key words: hand pose estimation, mask prompt, attention mechanism, Convolutional Neural Network (CNN), semantic segmentation

摘要:

手部姿态估计是计算机视觉的重要研究方向,传统方法易受复杂背景干扰,而深度学习方法虽具抗干扰能力,但在多手场景和细节识别方面仍存不足。因此,提出一种基于掩码提示和注意力机制的手部姿态估计方法HMCA(Hand Mask Prompts and Attention)。首先,利用目标检测和语义分割生成手部掩码图,从而屏蔽背景噪声并提供先验信息;其次,设计并行注意力模块(PAB)与多路残差模块(MRB),以提取多尺度特征,从而提高复杂手势识别能力,降低计算复杂度,并防止梯度消失;再次,利用掩码图引导模型关注手部区域,从而解决多手和遮挡问题;最后,在回归损失中加入惩罚项,从而约束关键点预测并加快模型收敛。实验结果表明,该方法在单手、多手和遮挡场景下均优于其他方法,在不同阈值下的曲线面积均值(AUC)和平均关节点位置误差(MPJPE)方面均取得最佳性能。在RHD(Rendered Handpose Dataset)上,该方法在不同阈值下的AUC为93.22%,MPJPE为2.15;在CMU Panoptic数据集上,该方法在不同阈值下的AUC为91.38%,手部关节点平均误差为2.06。

关键词: 手部姿态估计, 掩码提示, 注意力机制, 卷积神经网络, 语义分割

CLC Number: