《计算机应用》唯一官方网站

• •    下一篇

对人体姿态估计热图误差的再思考

杨飞宇1,宋展1,肖振中1,莫曜阳2,陈宇3,潘哲3,张敏3,张遥3,钱贝贝1,汤朝伟4,金武1   

  1. 1. 中科院深圳先进技术研究院
    2. 奥比中光科技股份有限公司
    3. 奥比中光科技集团股份有限公司
    4. 南京航空航天大学
  • 收稿日期:2021-05-17 修回日期:2021-11-04 发布日期:2021-12-03 出版日期:2021-12-03
  • 通讯作者: 杨飞宇

Rethiking Errors in Human Pose Estimation Heatmap

  • Received:2021-05-17 Revised:2021-11-04 Online:2021-12-03 Published:2021-12-03

摘要: 摘 要: 近年来,基于热图的算法一直占据人体姿态估计算法的主导地位。热图解码(即将热图转换为人体关节点坐标)算法是这类算法的重要基本环节。当前的热图解码算法并没有考虑系统误差的影响,因此,本文提出了一种基于误差补偿的人体姿态估计热图解码算法。该算法首先通过训练过程评估模型的误差补偿因子,然后在推理阶段用误差补偿因子补偿人体关节点的预测误差,该误差同时包括系统误差和随机误差。本文在不同的网络架构,输入分辨率,评估指标和数据集上进行的大量实验表明,与目前最佳的热图解码方法相比,本文方法获得了显著的精度增益。具体来说,HRNet-W48-256×192模型在COCO数据集上提升2.9个AP,ResNet-152-256×256模型的PCKh0.1指标在MPII数据集上提升了7.8%。此外,由于本文的方法无需采用高斯平滑预处理和求导操作,因此速度约为当前最佳方法的2倍。对于开展高精度,高速度的人体姿态估计具有实际的应用价值。

Abstract: Recently, the leading performance of human pose estimation is dominated by heatmap-based methodsalgorithms. Heatmap decoding (i.e. transforming heatmaps to coordinates) is a fundamental link of those algorithms. Previous heatmap decoding methods generally neglect the effects of systematic errors. Therefore, an error-compensation-based heatmap decoding method is proposed in this work, which estimates an error compensation factor during training and then compensates both systematic and random errors during inferring. While being a fundamental link of those methods, the heatmap decoding process (i.e. transforming heatmaps to coordinates) receives limited investigations. This work fills the gap by studying the heatmap decoding with a particular focus on errors. Extensive experiments on different network architectures, input resolutions, evaluation metrics and datasets validatedwas tested. The results show that compared with previous state-of-the-art method, significant accuracy gain can be achieved with thisthe proposd method. We, for the first time, revealed that heatmap-based methods suffer from significant errors, which nevertheless was ignored before. To tackle this issue, we define an optimal error compensation factor Δ_opt to describe the error properties of a network and use Δ_opt to compensates errors. Our method reduces the systematic and random errors in one shot with negligible extra computation. Extensive experiments with different network architectures, input sizes, evaluation metrics, and datasets have shown that the proposed method achieves significant accuracy gains over the state-of-the-art heatmap decoding method. Specifically, the HRNet-W48-256×192 model is improved by 2.9 AP on the COCO dataset and the PCKh0.1 of the ResNet-152-256×256 model is improved by 7.8% on the MPII dataset. Besides, unlike the existing methods, our the proposed method is smoothing-free and is 2 times faster (1.4 vs 3.0 ms/f) than the state-of-the-art. ThisThe work brought applicable values to developing fast and accurate human pose estimation .

中图分类号: