《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (1): 223-233.DOI: 10.11772/j.issn.1001-9081.2024010099

• 多媒体计算与计算机仿真 • 上一篇    下一篇

基于解耦注意力与幻影卷积的轻量级人体姿态估计

陈俊颖1, 郭士杰1,2,3(), 陈玲玲4   

  1. 1.复旦大学 工程与应用技术研究院,上海 200433
    2.河北工业大学 机械工程学院,天津 300130
    3.智能康复装置与检测技术教育部工程研究中心(河北工业大学),天津 300401
    4.河北工业大学 人工智能与数据科学学院,天津 300130
  • 收稿日期:2024-01-26 修回日期:2024-03-25 接受日期:2024-03-25 发布日期:2024-05-09 出版日期:2025-01-10
  • 通讯作者: 郭士杰
  • 作者简介:陈俊颖(2000—),男,湖南常德人,硕士研究生,主要研究方向:计算机视觉、人体姿态估计;
    陈玲玲(1981—),女,河北张家口人,教授,博士,主要研究方向:计算机视觉、助老助残机器人。
  • 基金资助:
    河北省省级科技计划项目(22372001D);河北省自然科学基金资助项目(F2021202021)

Lightweight human pose estimation based on decoupled attention and ghost convolution

Junying CHEN1, Shijie GUO1,2,3(), Lingling CHEN4   

  1. 1.Academy for Engineering and Technology,Fudan University,Shanghai 200433,China
    2.School of Mechanical Engineering,Hebei University of Technology,Tianjin 300130,China
    3.Intelligent Rehabilitation Devices and Detection Technology Engineering Research Center of the Ministry of Education (Hebei University of Technology),Tianjin 300401,China
    4.School of Artificial Intelligence,Hebei University of Technology,Tianjin 300130,China
  • Received:2024-01-26 Revised:2024-03-25 Accepted:2024-03-25 Online:2024-05-09 Published:2025-01-10
  • Contact: Shijie GUO
  • About author:CHEN Junying, born in 2000, M. S. candidate. His research interests include computer vision, human pose estimation.
    CHEN Lingling, born in 1981, Ph. D., professor. Her research interests include computer vision, robots for the elderly and disabled.
  • Supported by:
    Science and Technology Program of Hebei Province(22372001D);Natural Science Foundation of Hebei Province(F2021202021)

摘要:

随着轻量级网络的发展,人体姿态估计任务得以在计算资源有限的设备上执行,然而,提升精度变得更具有挑战性。这些挑战主要源于网络复杂度与计算资源的矛盾,导致模型在简化时牺牲了表示能力。针对上述问题,提出一种基于解耦注意力和幻影卷积的轻量级人体姿态估计网络(DGLNet)。具体来说,DGLNet以小型高分辨率网络(Small HRNet)模型为基础架构,通过引入解耦注意力机制构建DFDbottleneck模块;采用shuffleblock的结构对基础模块进行重新设计,即用轻量级幻影卷积替代计算量大的点卷积,并利用解耦注意力机制增强模块性能,从而构建DGBblock模块;此外,用幻影卷积和解耦注意力重新构建的深度可分离卷积模块来替代原过渡层模块,从而构建GSCtransition模块,进一步减少计算量并增强特征交互性和提高性能。在COCO验证集上的实验结果显示,DGLNet优于轻量级高分辨率网络(Lite-HRNet),在计算量和参数量不增加的情况下,最高精度达到了71.9%;与常见的轻量级姿态估计网络MobileNetV2和ShuffleNetV2相比,DGLNet在仅使用21.2%和25.0%的计算量情况下分别实现了4.6和8.3个百分点的精度提升;在AP50的评价标准上,DGLNet超过了大型高分辨率网络(HRNet)的同时计算量和参数量远小于HRNet。

关键词: 人体姿态估计, 轻量级网络, 注意力机制, 幻影卷积, 深度可分离卷积模块

Abstract:

With the development of lightweight networks, human pose estimation tasks can be performed on devices with limited computational resources. However, improving accuracy has become more challenging. These challenges mainly led by the contradiction between network complexity and computational resources, resulting in the sacrifice of representation capabilities when simplifying the model. To address these issues, a Decoupled attention and Ghost convolution based Lightweight human pose estimation Network (DGLNet) was proposed. Specifically, in DGLNet, with Small High-Resolution Network (Small HRNet) model as basic architecture, by introducing a decoupled attention mechanism, DFDbottleneck module was constructed. The basic modules were redesigned with shuffleblock structure, in which computationally-intensive point convolutions were replaced with lightweight ghost convolutions, and the decoupled attention mechanism was utilized to enhance module performance, leading to the creation of DGBblock module. Additionally, the original transition layer modules were replaced with redesigned depthwise separable convolution modules that incorporated ghost convolution and decoupled attention, resulting in the construction of GSCtransition module. This modification further reduced computational complexity while enhancing feature interaction and performance. Experimental results on COCO validation set show that DGLNet outperforms the state-of-the-art Lite-High-Resolution Network (Lite-HRNet) model, achieving the maximum accuracy of 71.9% without increasing computational complexity or the number of parameters. Compared to common lightweight pose estimation networks such as MobileNetV2 and ShuffleNetV2, DGLNet achieves the precision improvement of 4.6 and 8.3 percentage points respectively, while only utilizing 21.2% and 25.0% of their computational resources. Furthermore, under the AP50 evaluation criterion, DGLNet surpasses the large High-Resolution Network (HRNet) while having significantly less computational and parameters.

Key words: human pose estimation, lightweight network, attention mechanism, ghost convolution, depthwise separable convolution module

中图分类号: