《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (12): 4004-4011.DOI: 10.11772/j.issn.1001-9081.2024121819

• 多媒体计算与计算机仿真 • 上一篇    下一篇

轻量且高精度增强的姿态检测网络HG-YOLO

崔家礼1, 刘永基1, 李子贺1, 郑瀚1,2   

  1. 1.北方工业大学 信息学院,北京 100144
    2.广西高校人工智能与信息处理重点实验室(河池学院),广西 河池 546300
  • 收稿日期:2024-12-27 修回日期:2025-04-09 接受日期:2025-04-14 发布日期:2025-04-28 出版日期:2025-12-10
  • 通讯作者: 郑瀚
  • 作者简介:崔家礼(1975—),男,山东枣庄人,副研究员,博士,CCF会员,主要研究方向:图像处理、模式识别
    刘永基(1999—),男,山东枣庄人,硕士研究生,主要研究方向:图像处理、模式识别
    李子贺(2000—),男,天津人,硕士研究生,主要研究方向:图像处理、模式识别
    郑瀚(1987—),女,江苏泰州人,副教授,博士研究生,主要研究方向:图像处理、计算机视觉。
  • 基金资助:
    广西高校人工智能与信息处理重点实验室(河池学院)开放研究基金资助项目(2024GXZDSY006)

HG-YOLO: lightweight and high-precision enhancement pose detection network

Jiali CUI1, Yongji LIU1, Zihe LI1, Han ZHENG1,2   

  1. 1.School of Information Science and Technology,North China University of Technology,Beijing 100144,China
    2.Key Laboratory of AI and Information Processing of Guangxi Colleges and Universities (Hechi University),Hechi Guangxi 546300,China
  • Received:2024-12-27 Revised:2025-04-09 Accepted:2025-04-14 Online:2025-04-28 Published:2025-12-10
  • Contact: Han ZHENG
  • About author:CUI Jiali, born in 1975, Ph. D., associate research fellow. His research interests include image processing, pattern recognition.
    LIU Yongji, born in 1999, M. S. candidate. His research interests include image processing, pattern recognition.
    LI Zihe, born in 2000, M. S. candidate. His research interests include image processing, pattern recognition.
    ZHENG Han, born in 1987, Ph. D. candidate, associate professor. Her research interests include image processing, computer vision.
  • Supported by:
    Open Fund of Key Laboratory of AI and Information Processing of Guangxi Colleges and Universities (Hechi University)(2024GXZDSY006)

摘要:

在人体姿态检测任务中,现有的深度学习网络存在检测精度不足、网络参数复杂和计算成本高等问题,严重限制了它们的应用。为了解决这些问题,提出一种轻量且高精度的姿态检测改进网络HG-YOLO (High-precision and Ghost YOLO)。针对检测精度不足的问题,在HG-YOLO的主干网络,融合基于Transformer的检测网络RT-DETR (Real-Time DEtection TRansformer),并将大型可分离核注意力(LSKA)模块嵌入主干网络中,以在不增加内存占用和计算复杂性的基础上,提高网络应对复杂场景的特征提取能力,从而提高人体姿态的检测精度。针对网络参数复杂和计算成本高的问题,引入轻量化的Ghost卷积模块替换部分标准卷积,此外,在HG-YOLO的检测头部分,设计一种共享卷积检测头,以通过参数和权重共享机制减少卷积计算,从而降低网络的参数量和计算复杂度。在COCO (Common Objects in COntext) 2017-Keypoints数据集和CrowdPose数据集上的实验结果表明,与基准的YOLOv8-Pose网络相比,HG-YOLO的参数量减少了32%,浮点运算量减少了18%;在规模为小型(s)时,在COCO 2017-Keypoints数据集上,AP50 (Average Precision at OKS (Object Keypoint Similarity) of 0.50)提升了0.8个百分点,在CrowdPose数据集上,AP提升了2.9个百分点。可见,HG-YOLO不仅轻量,而且检测精度高,是人体姿态检测领域的优秀网络模型。

关键词: 姿态检测, 高精度增强, 大型可分离核注意力, 共享卷积检测头, 轻量化网络

Abstract:

In human pose detection task, the existing deep learning networks have the problems of insufficient detection precision, complex network parameters and high computational cost, which seriously limit their applications. To solve these problems, a lightweight and high-precision enhancement pose detection network HG-YOLO (High-precision and Ghost YOLO) was proposed. Aiming at the problem of insufficient detection precision, the Transformer-based detection network RT-DETR (Real-Time DEtection TRansformer) was integrated into the backbone part of HG-YOLO, and the Large Separable Kernel Attention (LSKA) module was embedded into the backbone, which improved feature extraction ability of the network to cope with the complex scenarios without increasing the memory occupation and computational complexity, thus improving the human pose detection precision. Aiming at the problem of complex network parameters and high computational cost, the lightweight Ghost convolution module was introduced to replace some of the standard convolutions, and furthermore, a shared convolution detection head was designed in the detection head part of HG-YOLO, which reduced the convolution computation through the parameter and weight sharing mechanism, thus reducing number of parameters and computational complexity of the network. Experimental results on the COCO (Common Objects in Context) 2017-Keypoints dataset and the CrowdPose dataset show that compared to the benchmark YOLOv8-Pose network, HG-YOLO reduces the parameters by 32% and the floating-point operations by 18%; and when the scale is s (small), on the COCO 2017-Keypoints dataset, HG-YOLO has the AP50 (Average Precision at OKS (Object Keypoint Similarity) of 0.50) improved by 0.8 percentage points, on the CrowdPose dataset, HG-YOLO has the AP improved by 2.9 percentage points. It can be seen that HG-YOLO is not only lightweight but also has high detection precision, which is an excellent network model in the field of human pose detection.

Key words: pose detection, high-precision enhancement, Large Separable Kernel Attention (LSKA), shared convolutional detection head, lightweight network

中图分类号: