Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (2): 550-555.DOI: 10.11772/j.issn.1001-9081.2020050721

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles     Next Articles

3D hand pose estimation based on label distribution learning

LI Weiqiang, LEI Hang, ZHANG Jingyu, WANG Xupeng   

  1. School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu Sichuan 610054, China
  • Received:2020-05-29 Revised:2020-08-10 Online:2021-02-10 Published:2020-08-26
  • Supported by:
    This work is partially supported by the Science and Technology Program of Sichuan Province (2018GZDZX0040).


李伟强, 雷航, 张静玉, 王旭鹏   

  1. 电子科技大学 信息与软件工程学院, 成都 610054
  • 通讯作者: 李伟强
  • 作者简介:李伟强(1995-),男,天津人,硕士研究生,主要研究方向:计算机视觉、深度学习;雷航(1960-),男,四川自贡人,教授,博士,主要研究方向:嵌入式软件、可信软件测试;张静玉(1998-),女,山东菏泽人,硕士研究生,主要研究方向:计算机视觉、深度学习;王旭鹏(1986-),男,山东烟台人,博士,主要研究方向:计算机视觉、模式识别。
  • 基金资助:

Abstract: Fast and reliable hand pose estimation has a wide application in the fields such as human-computer interaction. In order to deal with the influences to the hand pose estimation caused by the light intensity changes, self-occlusions and large pose variations, a deep network framework based on label distribution learning was proposed. In the network, the point cloud of the hand was used as the input data, which was normalized through the farthest point sampling and Oriented Bounding Box (OBB). Then, the PointNet++ was utilized to extract features from the hand point cloud data. To deal with the highly non-linear relationship between the point cloud and the hand joint points, the positions of the hand joint points were predicted by the label distribution learning network. Compared with the traditional depth map based approaches, the proposed method was able to effectively extract discriminative hand geometric features with low computation cost and high accuracy. A set of tests were conducted on the public MSRA dataset to verify the effectiveness of the proposed hand pose estimation network. Experimental results showed that the average error of the hand joints estimated by this network was 8.43 mm, the average processing time of a frame was 12.8 ms, and the error of pose estimation was reduced by 11.82% and 0.83% respectively compared with the 3D CNN and Hand PointNet.

Key words: 3D hand pose estimation, deep learning, Convolutional Neural Network (CNN), label distribution learning, point cloud data

摘要: 快速、可靠的手部姿态估计在人机交互等领域有着广泛的应用。为了解决光照强度变化、自身遮挡以及姿态变化幅度较大等情况对手部姿态估计的影响,提出了一种基于标签分布学习的深度网络结构。该网络将手部点云作为输入数据,首先通过最远点采样和定向边界框(OBB)对点云数据进行归一化处理,然后采用PointNet++提取手部点云数据特征。为了应对点云数据与手部关节点之间的高度非线性关系,通过标签分布学习网络预测手部关节点的位置信息。与传统的基于深度图的方法相比,该方法能够高效地提取高鉴别力的手部几何特征,并且计算复杂度较低、精确度较高。为了验证提出的手部姿态估计网络的有效性,在公共数据集MSRA上进行了一系列测试。实验结果表明,该网络估计出的手部关节点位置的平均误差为8.43 mm,平均每帧的处理时间为12.8 ms,而且姿态估计的误差相较于3D CNN算法降低了11.82%,相较于Hand PointNet算法降低了0.83%。

关键词: 三维手部姿态估计, 深度学习, 卷积神经网络, 标签分布学习, 点云数据

CLC Number: