Hand pose estimation is an important research direction in computer vision. Traditional methods are susceptible to complex background interference, while deep learning methods, despite being more robust, still face difficulties in multi-hand scenarios and fine-grained detail recognition. Therefore, a hand pose estimation method based on mask prompts and attention mechanisms, named HMCA(Hand Mask Prompts and Attention), was proposed. Firstly, hand mask maps, generated via object detection and semantic segmentation, were used to suppress background noise and provide prior information. Secondly, a Parallel Attention Block (PAB) and a Multi-path Residual Block (MRB) were designed to extract multi-scale features, thereby enhancing complex hand pose recognition ability, reducing computational complexity, and preventing gradient vanishing. Thirdly, the hand mask maps were utilized to guide the model to focus on hand regions, thereby addressing issues such as multi-hand and occlusion. Finally, a penalty term was incorporated into the regression loss to constrain keypoint prediction and accelerate model convergence. Experimental results show that the proposed method outperforms other methods with best performance on both the Area Under the Curve (AUC) and the Mean Per Joint Position Error (MPJPE) under varying thresholds in single-hand, multi-hand, and occlusion scenarios. On the RHD (Rendered Handpose Dataset), an AUC of 93.22% and a MPJPE of 2.15 are achieved under varying thresholds; on the CMU Panoptic dataset, an AUC of 91.38% and a mean hand keypoint error of 2.06 are reported under varying thresholds.