3D point cloud head pose estimation based on deep learning

doi:10.11772/j.issn.1001-9081.2019081479

Abstract

Abstract: Fast and reliable head pose estimation algorithm is the basis of many high-level face analysis tasks. In order to solve the problem of existing algorithms such as illumination changes,occlusions and large pose variations,a new deep learning framework named HPENet was proposed. Firstly,with the point cloud data used as input,the feature points were extracted from the point cloud structure by using the farthest point sampling algorithm. With feature points as centers,points within spheres with several radiuses were grouped for the further feature description. Then,the multi-layer perceptron and the maximum pooling layer were used to implement the feature extraction of the point cloud,and the predicted head pose was output by the extracted features through the fully connected layer. To verify the effectiveness of HPENet,experiments were carried out on the Biwi Kinect Head Pose dataset. Experimental results show that the errors on angles of pitch,roll and yaw produced by HPENet are 2. 3,1. 5 and 2. 4 degree respectively,and the average time cost of HPENet is 8 ms per frame. Compared with other excellent algorithms,the proposed method has a better performance in terms of both accuracy and computational complexity.

Key words: head pose estimation, deep learning, Convolutional Neural Network (CNN), point cloud data

摘要： 快速、可靠的头部姿态估计算法是高级人脸分析任务的基础。为了解决现有算法存在的光照变化、遮挡、姿态尺度较大等问题，提出一种新的深度学习框架HPENet。该网络以点云数据为输入，首先通过最远点采样算法提取点云结构中的特征点，以特征点为球心，将不同半径的球体内的点构成多个分组，用于后续的特征描述；然后采用多层感知器和最大池化层实现点云的特征提取，提取的特征通过全连接层输出预测的头部姿态。为了验证HPENet的有效性，在公共数据集Biwi Kinect Head Pose上进行测试。实验结果显示，HPENet在俯仰角、侧倾角和偏航角上的误差分别为2.3°、1.5°、2.4°，平均每帧的时间消耗为8 ms。与其他优秀算法相比，所提方法在准确度和计算的复杂度方面都具有更好的性能。

关键词: 头部姿态估计, 深度学习, 卷积神经网络, 点云数据

CLC Number:

TP391

XIAO Shihua, SANG Nan, WANG Xupeng. 3D point cloud head pose estimation based on deep learning[J]. Journal of Computer Applications, 2020, 40(4): 996-1001.

肖仕华, 桑楠, 王旭鹏. 基于深度学习的三维点云头部姿态估计[J]. 计算机应用, 2020, 40(4): 996-1001.

References

[1] HSU H-W,WU T Y,WAN S,et al. QuatNet:quaternion-based head pose estimation with multiregression loss[J]. IEEE Transactions on Multimedia,2019,21(4):1035-1046.
[2] PATACCHIOLA M,CANGELOSI A. Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods[J]. Pattern Recognition,2017,71:132-143.
[3] RUIZ N, CHONG E, REHG J M. Fine-grained head pose estimation without keypoints[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway:IEEE,2018:2074-2083.
[4] BORGHI G,FABBRI M,VEZZANI M,et al. Face-from-depth for head pose estimation on depth images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2020, 42(3):596-609.
[5] KUCHINSKY A,PERING C,CREECH M L,et al. FotoFile:a consumer multimedia organization and retrieval system[C]//Proceedings of the 1999 SIGCHI Conference on Human Factors in Computing Systems. New York:ACM,1999:496-503.
[6] FANELLI G,GALL J,VAN GOOL L. Real time head pose estimation with random regression forests[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2011:617-624.
[7] AHN B,PARK J,IN SO KWEON. Real-time head orientation from a monocular camera using deep neural network[C]//Proceedings of the 2014 Asian Conference on Computer Vision,LNCS 9005. Cham:Springer,2014:82-96.
[8] DROUARD V, BA S, EVANGELIDIS G, et al. Head pose estimation via probabilistic high-dimensional regression[C]//Proceedings of the 2015 IEEE International Conference on Image Processing. Piscataway:IEEE,2015:4624-4628.
[9] VENTURELLI M,BORGHI G,VEZZANI R,et al. From depth data to head pose estimation:a siamese approach[C]//Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Setúbal, Portugal:Science and Technology Publications,2017,5:194-201.
[10] PADELERIS P, ZABULIS X, ARGYROS A A. Head pose estimation on depth data based on particle swarm optimization[C]//Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Washington,DC:IEEE Computer Society,2012:42-49.
[11] PAPAZOV C,MARKS T K,JONES M. Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:4722-4730.
[12] VENTURELLI M,BORGHI G,VEZZANI R,et al. Deep head pose estimation from depth data for in-car automotive applications[C]//Proceedings of the 2016 International Workshop on Understanding Human Activities through 3D Sensors, LNCS 10188. Cham:Springer,2016:74-85.
[13] CHEN X,CAO Z G,XIAO Y,et al. Hand pose estimation in depth image using CNN and random forest[C]//Proceedings of the 10th International Symposium on Multispectral Image Processing and Pattern Recognition,SPIE 10609. Bellingham,WA:SPIE, 2017:No. 2288114.
[14] QI C R,SU H,MO K,et al. PointNet:deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:77-85.
[15] QI C R,YI L,SU H,et al. PointNet ++:deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York:Curran Associates,2017:5105-5114.
[16] LECUN Y,BENGIO Y,HINTON G. Deep learning[J]. Nature, 2015,521(7553):436-444.
[17] 王旭鹏, 雷航, 刘燕, 等. 三维非刚性模型的特征检测描述与配准技术研究[J]. 计算机应用,2018,38(8):2381-2385. (WANG X P,LEI H,LIU Y,et al. Hierarchical approach for 3D non-rigid shape registration[J]. Journal of Computer Applications,2018,38(8):2381-2385.)
[18] MOENNING C,DOGSON N A. Fast marching farthest point sampling for implicit surfaces and point clouds[R]. Cambridge:Computer Laboratory of University of Cambridge,2003.
[19] SAEED A,AL-HAMADI A. Boosted human head pose estimation using Kinect camera[C]//Proceedings of the 2015 IEEE International Conference on Image Processing. Piscataway:IEEE, 2015:1752-1756.
[20] YANG J,LIANG W,JIA Y. Face pose estimation with combined 2D and 3D HOG features[C]//Proceedings of the 21st International Conference on Pattern Recognition, Piscataway:IEEE,2012:2492-2495.
[21] FATHIAN K,RAMIREZ-PAREDES J P,DOUCETTE E A,et al. QuEst:a quaternion-based approach for camera motion estimation from minimal feature points[J]. IEEE Robotics and Automation Letters,2018,3(2):857-864.
[22] BALTRUŠAITIS T, ROBINSON P, MORENCY L P. 3D constrained local model for rigid and non-rigid facial tracking[C]//Proceedings of the 2012 Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2012:2610-2617.