Journal of Computer Applications
Next Articles
Received:
Revised:
Accepted:
Online:
Published:
Supported by:
任建华1,曹佳惠2,贾迪1
通讯作者:
基金资助:
Abstract: Hand pose estimation is a crucial task in computer vision. Traditional methods were susceptible to background interference, while deep learning approaches, despite being more robust, still faced difficulties in multi-hand scenarios and fine-grained detail recognition. A hand pose estimation method based on mask cueing and attention mechanisms was proposed to address these challenges. Initially, hand mask maps, generated via object detection and semantic segmentation, were used to suppress background noise and provide prior information. A parallel attention module and a multi-path residual module were designed to extract multi-scale features, enhance complex gesture recognition, reduce computational complexity, and prevent gradient vanishing. The hand mask maps guided the model to focus on hand regions, effectively addressing issues such as multi-hand overlap and occlusion. A penalty term was incorporated into the regression loss function to constrain keypoint predictions within hand regions, thus accelerating model convergence. Experimental results showed that the proposed method outperformed other methods in single-hand, multi-hand, and occlusion scenarios. On the RHD dataset, an AUC of 93.22% and a mean hand keypoint error of 2.15 were achieved. On the CMU Panoptic dataset, an AUC of 91.38% and a mean hand keypoint error of 2.06 were reported.
Key words: hand pose estimation, mask cueing, attention mechanism, convolutional neural network, Semantic segmentation
摘要: 手部姿态估计是计算机视觉的重要研究方向,传统方法易受复杂背景干扰,深度学习方法虽具抗干扰能力,但在多手场景和细节识别方面仍存不足。为此,本文提出一种基于掩码提示和注意力机制的手部姿态估计方法。首先,利用目标检测和语义分割生成手部掩码图,屏蔽背景噪声并提供先验信息;其次,设计并行注意力模块与多路残差模块,提取多尺度特征,提高复杂手势识别能力,并降低计算复杂度,防止梯度消失;再次,利用掩码图引导模型关注手部区域,解决多手重叠和遮挡问题;最后,在回归损失中加入惩罚项,约束关键点预测并加快收敛。实验结果表明,该方法在单手、多手及遮挡场景下均表现优异,在不同阈值下的曲线面积均值和手部关节点误差方面取得最佳性能。在 RHD 数据集上,该方法在不同阈值下的曲线面积均值为 93.22%,手部关节点平均误差为 2.15;在 CMU Panoptic 数据集上,该方法的曲线面积均值为 91.38%,手部关节点平均误差为 2.06。
关键词: 手部姿态估计, 掩码提示, 注意力机制, 卷积神经网络, 语义分割
CLC Number:
TP391.41
任建华 曹佳惠 贾迪. 基于掩码提示和注意力的手部姿态估计[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2024111715.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024111715