《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (3): 953-959.DOI: 10.11772/j.issn.1001-9081.2021030427

• 多媒体计算与计算机仿真 • 上一篇    

改进的基于锚点的三维手部姿态估计网络

危德健, 王文明, 王全玉(), 任好盼, 高彦彦, 王志   

  1. 北京理工大学 计算机学院,北京 100081
  • 收稿日期:2021-03-22 修回日期:2021-05-26 接受日期:2021-05-31 发布日期:2022-04-09 出版日期:2022-03-10
  • 通讯作者: 王全玉
  • 作者简介:危德健(1996—),男,福建南平人,硕士研究生,主要研究方向:计算机视觉、深度学习
    王文明(1967—),男,北京人,副教授,硕士,主要研究方向:人机交互、软件智能
    任好盼(1995—),男,河南许昌人,硕士研究生,主要研究方向:计算机视觉、深度学习、姿态估计
    高彦彦(1986—),男,河北正定人,硕士研究生,主要研究方向:嵌入式系统、计算机视觉、深度学习
    王志(1996—),男,江苏南京人,硕士研究生,主要研究方向:计算机视觉、深度学习。
  • 基金资助:
    国家自然科学基金资助项目(71834001)

Improved 3D hand pose estimation network based on anchor

Dejian WEI, Wenming WANG, Quanyu WANG(), Haopan REN, Yanyan GAO, Zhi WANG   

  1. School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
  • Received:2021-03-22 Revised:2021-05-26 Accepted:2021-05-31 Online:2022-04-09 Published:2022-03-10
  • Contact: Quanyu WANG
  • About author:WEI Dejian, born in 1996, M. S. candidate. His research interests include computer vision, deep learning.
    WANG Wenming, born in 1967, M. S., associate professor. His research interests include human-computer interaction, software intelligence.
    REN Haopan, born in 1995, M. S. candidate. His research interests include computer vision, deep learning, pose estimation.
    GAO Yanyan, born in 1986, M. S. candidate. His research interests include embedded systems, computer vision, deep learning.
    WANG Zhi, born in 1996, M. S. candidate. His research interests include computer vision, deep learning.
  • Supported by:
    National Natural Science Foundation of China(71834001)

摘要:

近年来基于锚点的三维手部姿态估计方法比较流行,A2J(Anchor-to-Joint)是比较有代表性的方法之一。A2J在深度图上密集地设置锚点,利用神经网络预测锚点到关键点的偏差以及每个锚点的权重。A2J使用预测的偏差和权重,以加权求和的方式计算关键点的坐标,降低了网络回归结果中的噪声。虽然A2J简单高效,但是不恰当的网络结构和损失函数影响了网络的准确度,因此提出改进的网络HigherA2J。首先,使用一个分支预测锚点到关键点的XYZ偏差,更好地利用深度图的3D特性;其次,简化A2J的网络分支结构从而降低网络参数量;最后,设计关键点估计损失函数,结合关键点估计损失和偏差估计损失,有效提高估计准确度。在三个数据集NYU、ICVL和HANDS 2017上的实验结果显示,手部姿态估计的平均误差比A2J都有所降低,分别降低了0.32 mm,0.35 mm和0.10 mm。

关键词: 三维手部姿态估计, 深度学习, 卷积神经网络, 锚点, 损失函数

Abstract:

In recent years, anchor-based 3D hand pose estimation methods are becoming popular, and Anchor-to-Joint (A2J) is one of the more representative methods. In A2J, anchor points are densely set on depth map, and neural network is used to predict offsets between anchor points and key points together with weights of anchor points; predicted offsets and weights are used to calculate the coordinates of key points in a weighted summation mode to reduce noise in network regression results. A2J methods are simple and effective, but they are sensitive to ill-suited network structure and prone to inaccurate regression due to loss function. Therefore, an improved network HigherA2J was proposed. Firstly, a single branch jointly predicted XY and Z offsets between anchors and key points to better utilize 3D characteristics of depth map; secondly, network branch structure was simplified to reduce network parameters; finally, the loss function for key point estimation was designed, combined with offset estimation loss, which improved the overall estimation accuracy effectively. Experimental results show the reductions in average hand pose estimation error of 0.32 mm, 0.35 mm and 0.10 mm compared to conventional A2J on three datasets NYU, ICVL and HANDS 2017 respectively.

Key words: 3D hand pose estimation, deep learning, Convolutional Neural Network (CNN), anchor, loss function

中图分类号: