3D hand pose estimation based on label distribution learning

doi:10.11772/j.issn.1001-9081.2020050721

Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (2): 550-555.DOI: 10.11772/j.issn.1001-9081.2020050721

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles Next Articles

3D hand pose estimation based on label distribution learning

LI Weiqiang, LEI Hang, ZHANG Jingyu, WANG Xupeng

School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu Sichuan 610054, China

Received:2020-05-29 Revised:2020-08-10 Online:2021-02-10 Published:2020-08-26
Supported by:
This work is partially supported by the Science and Technology Program of Sichuan Province (2018GZDZX0040).

基于标签分布学习的三维手部姿态估计

李伟强, 雷航, 张静玉, 王旭鹏

电子科技大学信息与软件工程学院, 成都 610054

通讯作者: 李伟强
作者简介:李伟强(1995-),男,天津人,硕士研究生,主要研究方向:计算机视觉、深度学习;雷航(1960-),男,四川自贡人,教授,博士,主要研究方向:嵌入式软件、可信软件测试;张静玉(1998-),女,山东菏泽人,硕士研究生,主要研究方向:计算机视觉、深度学习;王旭鹏(1986-),男,山东烟台人,博士,主要研究方向:计算机视觉、模式识别。
基金资助:
四川省科技计划项目（2018GZDZX0040）。

Abstract

Abstract: Fast and reliable hand pose estimation has a wide application in the fields such as human-computer interaction. In order to deal with the influences to the hand pose estimation caused by the light intensity changes, self-occlusions and large pose variations, a deep network framework based on label distribution learning was proposed. In the network, the point cloud of the hand was used as the input data, which was normalized through the farthest point sampling and Oriented Bounding Box (OBB). Then, the PointNet++ was utilized to extract features from the hand point cloud data. To deal with the highly non-linear relationship between the point cloud and the hand joint points, the positions of the hand joint points were predicted by the label distribution learning network. Compared with the traditional depth map based approaches, the proposed method was able to effectively extract discriminative hand geometric features with low computation cost and high accuracy. A set of tests were conducted on the public MSRA dataset to verify the effectiveness of the proposed hand pose estimation network. Experimental results showed that the average error of the hand joints estimated by this network was 8.43 mm, the average processing time of a frame was 12.8 ms, and the error of pose estimation was reduced by 11.82% and 0.83% respectively compared with the 3D CNN and Hand PointNet.

Key words: 3D hand pose estimation, deep learning, Convolutional Neural Network (CNN), label distribution learning, point cloud data

摘要： 快速、可靠的手部姿态估计在人机交互等领域有着广泛的应用。为了解决光照强度变化、自身遮挡以及姿态变化幅度较大等情况对手部姿态估计的影响，提出了一种基于标签分布学习的深度网络结构。该网络将手部点云作为输入数据，首先通过最远点采样和定向边界框（OBB）对点云数据进行归一化处理，然后采用PointNet++提取手部点云数据特征。为了应对点云数据与手部关节点之间的高度非线性关系，通过标签分布学习网络预测手部关节点的位置信息。与传统的基于深度图的方法相比，该方法能够高效地提取高鉴别力的手部几何特征，并且计算复杂度较低、精确度较高。为了验证提出的手部姿态估计网络的有效性，在公共数据集MSRA上进行了一系列测试。实验结果表明，该网络估计出的手部关节点位置的平均误差为8.43 mm，平均每帧的处理时间为12.8 ms，而且姿态估计的误差相较于3D CNN算法降低了11.82%，相较于Hand PointNet算法降低了0.83%。

关键词: 三维手部姿态估计, 深度学习, 卷积神经网络, 标签分布学习, 点云数据

CLC Number:

TP391

LI Weiqiang, LEI Hang, ZHANG Jingyu, WANG Xupeng. 3D hand pose estimation based on label distribution learning[J]. Journal of Computer Applications, 2021, 41(2): 550-555.

李伟强, 雷航, 张静玉, 王旭鹏. 基于标签分布学习的三维手部姿态估计[J]. 计算机应用, 2021, 41(2): 550-555.

References

[1] ZIMMERMANN C,BROX T. Learning to estimate 3D hand pose from single RGB images[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:4913-4921.
[2] PANTELERIS P,OIKONOMIDIS I,ARGYROS A. Using a single RGB frame for real time 3D hand pose estimation in the wild[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway:IEEE,2018:436-445.
[3] CAI Y,GE L,CAI J,et al. Weakly-supervised 3D hand pose estimation from monocular RGB images[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11210. Cham:Springer,2018:678-694.
[4] GENG X,XIA Y. Head pose estimation based on multivariate label distribution[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:1837-1842.
[5] YANG T Y,CHEN Y T,LIN Y Y,et al. FSA-Net:learning finegrained structure aggregation for head pose estimation from a single image[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:1087-1096.
[6] XIA J,CAO L,ZHANG G,et al. Head pose estimation in the wild assisted by facial landmarks based on convolutional neural networks[J]. IEEE Access,2019,7:48470-48483.
[7] TOMPSON J,STEIN M,LECUN Y,et al. Real-time continuous pose recovery of human hands using convolutional networks[J]. ACM Transactions on Graphics,2014,33(5):No. 169.
[8] OBERWEGER M, WOHLHART P, LEPETIT V. Training a feedback loop for hand pose estimation[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:3316-3324.
[9] OBERWEGER M,WOHLHART P,LEPETIT V. Hands deep in deep learning for hand pose estimation[EB/OL].[2020-03-03]. https://arxiv.org/pdf/1502.06807.pdf.
[10] OBERWEGER M,LEPETIT V. DeepPrior++:improving fast and accurate 3D hand pose estimation[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Piscataway:IEEE,2017:585-594.
[11] GE L, LIANG H, YUAN J, et al. Robust 3D hand pose estimation in single depth images:from single-view CNN to multiview CNNs[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:3593-3601.
[12] GE L,LIANG H,YUAN J,et al. 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:5679-5688.
[13] GE L,CAI Y,WENG J,et al. Hand PointNet:3D hand pose estimation using point sets[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:8417-8426.
[14] QI C R,SU H,MO K,et al. PointNet:deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:652-660.
[15] QI C R,YI L,SU H,et al. PointNet++:deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:5105-5114.
[16] BOUCHACOURT D,KUMAR M P,NOWOZIN S. Disco Nets:dissimilarity coefficients networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc., 2016:352-360.
[17] MOON G,YONG CHANG J,MU LEE K. V2V-PoseNet:voxelto-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:5079-5088.
[18] 王旭鹏. 三维非刚性模型的特征检测描述与配准技术研究[J]. 计算机应用,2018,38(8):2381-2385.(WANG X P. Hierarchical approach for 3D non-rigid shape registration[J]. Journal of Computer Applications,2018,38(8):2381-2385.)
[19] MOENNING C,DOGSON N A. Fast marching farthest point sampling for implicit surfaces and point clouds[R]. Cambridge:University of Cambridge Computer Laboratory,2003:565.
[20] SUN X,WEI Y,LIANG S,et al. Cascaded hand pose regression[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:824-832.
[21] TAYLOR J, SHOTTON J, SHARP T, et al. The Vitruvian manifold:inferring dense correspondences for one-shot human pose estimation[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2012:103-110.
[22] WAN C,PROBST T,VAN GOOL L,et al. Crossing nets:combining GANs and VAEs with a shared latent space for hand pose estimation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:1196-1205.

3D hand pose estimation based on label distribution learning

基于标签分布学习的三维手部姿态估计

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	CHEN Chengrui, SUN Ning, HE Shibiao, LIAO Yong. Deep learning-based joint channel estimation and equalization algorithm for C-V2X communications [J]. Journal of Computer Applications, 2021, 41(9): 2687-2693.
[2]	SONG Zhongshan, LIANG Jiarui, ZHENG Lu, LIU Zhenyu, TIE Jun. Remote sensing scene classification based on bidirectional gated scale feature fusion [J]. Journal of Computer Applications, 2021, 41(9): 2726-2735.
[3]	LI Kangkang, ZHANG Jing. Multi-layer encoding and decoding model for image captioning based on attention mechanism [J]. Journal of Computer Applications, 2021, 41(9): 2504-2509.
[4]	ZHANG Yongbin, CHANG Wenxin, SUN Lianshan, ZHANG Hang. Detection method of domains generated by dictionary-based domain generation algorithm [J]. Journal of Computer Applications, 2021, 41(9): 2609-2614.
[5]	ZHAO Hong, KONG Dongyi. Chinese description of image content based on fusion of image feature attention and adaptive attention [J]. Journal of Computer Applications, 2021, 41(9): 2496-2503.
[6]	XU Jianglang, LI Linyan, WAN Xinjun, HU Fuyuan. Indoor scene recognition method combined with object detection [J]. Journal of Computer Applications, 2021, 41(9): 2720-2725.
[7]	WANG Hebing, ZHANG Chunmei. Facial landmark detection based on ResNeXt with asymmetric convolution and squeeze excitation [J]. Journal of Computer Applications, 2021, 41(9): 2741-2747.
[8]	ZHENG Zhiqiang, HU Xin, WENG Zhi, WANG Yuhe, CHENG Xi. Cattle eye image feature extraction method based on improved DenseNet [J]. Journal of Computer Applications, 2021, 41(9): 2780-2784.
[9]	XIE Defeng, JI Jianmin. Syntax-enhanced semantic parsing with syntax-aware representation [J]. Journal of Computer Applications, 2021, 41(9): 2489-2495.
[10]	DAI Yurou, YANG Qing, ZHANG Fengli, ZHOU Fan. Trajectory prediction model of social network users based on self-supervised learning [J]. Journal of Computer Applications, 2021, 41(9): 2545-2551.
[11]	HE Zhenghai, XIAN Yantuan, WANG Meng, YU Zhengtao. Case reading comprehension method combining syntactic guidance and character attention mechanism [J]. Journal of Computer Applications, 2021, 41(8): 2427-2431.
[12]	CAO Yuhong, XU Hai, LIU Sun'ao, WANG Zixiao, LI Hongliang. Review of deep learning-based medical image segmentation [J]. Journal of Computer Applications, 2021, 41(8): 2273-2287.
[13]	QIN Binbin, PENG Liangkang, LU Xiangming, QIAN Jiangbo. Research progress on driver distracted driving detection [J]. Journal of Computer Applications, 2021, 41(8): 2330-2337.
[14]	HUANG Chengcheng, DONG Xiaoxiao, LI Zhao. Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm [J]. Journal of Computer Applications, 2021, 41(8): 2258-2264.
[15]	ZENG Xiangyin, ZHENG Bochuan, LIU Dan. Detection of left and right railway tracks based on deep convolutional neural network and clustering [J]. Journal of Computer Applications, 2021, 41(8): 2324-2329.