《计算机应用》唯一官方网站

• •    下一篇

基于冗余特征抑制的轻量级人体姿态估计网络

吕超,马歌谣   

  1. 长春理工大学
  • 收稿日期:2025-06-24 修回日期:2025-09-05 发布日期:2025-09-17 出版日期:2025-09-17
  • 通讯作者: 吕超
  • 基金资助:
    吉林省自然科学基金;国家重点研发计划

Lightweight human pose estimation network based on redundant feature suppression

  • Received:2025-06-24 Revised:2025-09-05 Online:2025-09-17 Published:2025-09-17

摘要: 针对现有人体姿态估计网络在复杂场景下难以兼顾计算效率与定位精度的问题,提出一种基于冗余特征抑制的轻量级人体姿态估计网络。将其命名为LE-SHNet(Lightweight Enhanced Stacked Hourglass Network)。首先,在沙漏模块中设计多重分离沙漏模块(MSHM),通过异构卷积分支差异化建模大关节与末端肢体特征,并有效抑制冗余计算;其次,在沙漏模块之间引入混洗高效通道注意力(SECA),融合通道混洗与自适应卷积,以零参数量强化跨层级关节点关联;最后,在非沙漏模块中构建空间通道感知模块(SCPM),利用空间通道重构与三重注意力机制增强关键区域的感知能力。实验结果表明,该网络在MPII(Max Planck Institute for Informatics)和COCO2017(Common Objects in COntext 2017)数据集上分别达到88.7%和71.3%精度,较基线网络2-SHNet(2 Stacked Hourglass Network)在参数量上减少49.3%,计算量降低28.2%,精度提升1.1个百分点。与2024和2025年提出的轻量级人体姿态估计网络EL-HRNet(Efficient and Lightweight High-Resolution Network)和MobileMultiPose(Mobile-friendly and Multi-feature aggregation Pose estimation) 相比,LE-SHNet的精度提升1.0和0.8个百分点,同时参数量减少32.0%和26.7%。LE-SHNet在保持轻量化的同时提升了关键点定位精度,具有在边缘设备实时部署中的潜在应用价值,可广泛用于智能监控、人机交互及运动康复等场景。

关键词: 计算机视觉, 人体姿态估计, 多重分离沙漏模块, 混洗高效通道注意力, 空间通道感知模块, 冗余特征抑制, 多尺度特征融合

Abstract: A lightweight human pose estimation network based on redundant feature suppression was proposed to address the difficulty of balancing computational efficiency and localization accuracy in complex scenarios. It was termed LE-SHNet(Lightweight Enhanced Stacked Hourglass Network). First, the Multiple Separated Hourglass Module (MSHM) was designed to employ heterogeneous convolution branches for differential modeling of large joints and distal limbs, while suppressing redundant computations. Then, the Shuffle Efficient Channel Attention (SECA) was integrated between hourglass modules, which combines channel shuffling and adaptive kernel convolution to enhance long-range joint correlations with zero additional parameters. Finally, the Spatial and Channel Perception Module (SCPM) was constructed in non-hourglass modules to strengthen spatial attention and channel responses by introducing spatial-channel reconstruction and triplet attention. Experimental results show that LE-SHNet achieves accuracy scores of 88.7% on Max Planck Institute for Informatics (MPII)and 71.3% on Common Objects in COntext 2017(COCO2017), reducing parameters by 49.3% and computational cost by 28.2% compared with the baseline 2 Stacked Hourglass Network (2-SHNet). Compared with the lightweight human pose estimation networks proposed in 2024 and 2025, namely EL-HRNet (Efficient and Lightweight High-Resolution Network) and MobileMultiPose (Mobile-friendly and Multi-feature aggregation Pose estimation), LE-SHNet achieves accuracy improvements of 1.0 and 0.8 percentage points, while reducing the number of parameters by 32.0% and 26.7%, respectively. These findings indicate that LE-SHNet maintains lightweight properties while significantly improving keypoint localization accuracy, making it suitable for real-time deployment on edge devices with promising applications in intelligent surveillance, human–computer interaction, and rehabilitation monitoring.

Key words: computer vision, human pose estimation, Multiple Separated Hourglass Module(MSHM), Shuffle Efficient Channel Attention(SECA), Spatial and Channel Perception Module(SCPM), redundant feature suppression, Multi-scale feature fusion

中图分类号: