《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2407-2414.DOI: 10.11772/j.issn.1001-9081.2021061103

• 人工智能 • 上一篇    下一篇

基于注意力机制的轻量型人体姿态估计

李坤1, 侯庆1,2()   

  1. 1.贵州大学 计算机科学与技术学院,贵阳 550025
    2.贵州省通信产业服务有限公司,贵阳 550005
  • 收稿日期:2021-06-29 修回日期:2021-09-21 接受日期:2021-09-28 发布日期:2022-08-09 出版日期:2022-08-10
  • 通讯作者: 侯庆
  • 作者简介:李坤(1997—),男,山东潍坊人,硕士研究生,主要研究方向:图像处理、计算机视觉;
    侯庆(1980—),男,天津人,研究员,博士,CCF会员,主要研究方向:数据挖掘、图像处理。
  • 基金资助:
    国家创新型城市“百城百园”行动项目(筑科项目[2020]22号);贵州省大数据专项计划项目(20200424)

Lightweight human pose estimation based on attention mechanism

Kun LI1, Qing HOU1,2()   

  1. 1.College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China
    2.Guizhou Communication Industry Service Company Limited,Guiyang Guizhou 550005,China
  • Received:2021-06-29 Revised:2021-09-21 Accepted:2021-09-28 Online:2022-08-09 Published:2022-08-10
  • Contact: Qing HOU
  • About author:LI Kun, born in 1997, M. S. candidate. His research interests include image processing, computer vision.
    HOU Qing, born in 1980, Ph. D., research fellow. His research interests include data mining, image processing.
  • Supported by:
    National Innovation City “Hundred Cities and Hundred Gardens” Action Project (Architectural Science Project [2020] No. 22), Guizhou Big Data Special Program(20200424)

摘要:

针对高分辨率人体姿态估计网络存在参数量大、运算复杂度高等问题,提出一种基于高分辨率网络(HRNet)的轻量型沙漏坐标注意力网络(SCANet)用于人体姿态估计。首先引入沙漏(Sandglass)模块和坐标注意力(CoordAttention)模块;然后在此基础上构建了沙漏坐标注意力瓶颈(SCAneck)模块和沙漏坐标注意力基础 (SCAblock)模块两种轻量型模块,在降低模型参数量和运算复杂度的同时,获取特征图空间方向的长程依赖和精确位置信息。实验结果显示,在相同图像分辨率和环境配置的情况下,在COCO(Common Objects in COntext)校验集上,SCANet模型与HRNet模型相比参数量降低了52.6%,运算复杂度降低了60.6%;在MPII(Max Planck Institute for Informatics)校验集上,SCANet模型与HRNet模型相比参数量和运算复杂度分别降低了52.6%和61.1%;与常见的人体姿态估计网络如堆叠沙漏网络(Hourglass)、级联金字塔网络(CPN)和SimpleBaseline相比,SCANet模型在拥有更少的参数量与运算复杂度的情况下,仍能实现对人体关键点的高准确度预测。

关键词: 人体姿态估计, 深度神经网络, 高分辨率网络, 深度可分离卷积, 注意力机制

Abstract:

To solve the problems such as large number of parameters and high computational complexity of the high-resolution human pose estimation networks, a lightweight Sandglass Coordinate Attention Network (SCANet) based on High-Resolution Network (HRNet) was proposed for human pose estimation. The Sandglass module and the Coordinate Attention (CoordAttention) module were first introduced; then two lightweight modules, the Sandglass Coordinate Attention bottleneck (SCAneck) module and the Sandglass Coordinate Attention basicblock (SCAblock) module, were built on this basis to obtain the long-range dependence and accurate position information of the spatial direction of the feature map while reducing the amount of model parameters and computational complexity. Experimental results show that with the same image resolution and environmental configuration, SCANet model reduces the number of parameters by 52.6% and the computational complexity by 60.6% compared with HRNet model on Common Objects in COntext (COCO) validation set; the number of parameters and computational complexity of SCANet model are reduced by 52.6% and 61.1% respectively compared with those of HRNet model on Max Planck Institute for Informatics (MPII) validation set; compared with common human pose estimation networks such as Stacked Hourglass Network (Hourglass), Cascaded Pyramid Network (CPN) and SimpleBaseline, SCANet model can still achieve high-precision prediction of key points of the human body with fewer parameters and lower computational complexity.

Key words: human pose estimation, deep neural network, High Resolution Network (HRNet), depthwise separable convolution, attention mechanism

中图分类号: