Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (4): 996-1001.DOI: 10.11772/j.issn.1001-9081.2019081479

• Artificial intelligence • Previous Articles     Next Articles

3D point cloud head pose estimation based on deep learning

XIAO Shihua, SANG Nan, WANG Xupeng   

  1. School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu Sichuan 610000, China
  • Received:2019-08-29 Revised:2019-10-31 Online:2020-04-10 Published:2019-11-18

基于深度学习的三维点云头部姿态估计

肖仕华, 桑楠, 王旭鹏   

  1. 电子科技大学 信息与软件工程学院, 成都 610000
  • 通讯作者: 肖仕华
  • 作者简介:肖仕华(1994-),男,四川德阳人,硕士研究生,主要研究方向:计算机视觉、深度学习、图像处理;桑楠(1964-),男,四川营山人,教授,硕士,主要研究方向:嵌入式实时高可信技术、嵌入式软件工程、中间件;王旭鹏(1986-),男,山东烟台人,博士,主要研究方向:计算机视觉、模式识别。

Abstract: Fast and reliable head pose estimation algorithm is the basis of many high-level face analysis tasks. In order to solve the problem of existing algorithms such as illumination changes,occlusions and large pose variations,a new deep learning framework named HPENet was proposed. Firstly,with the point cloud data used as input,the feature points were extracted from the point cloud structure by using the farthest point sampling algorithm. With feature points as centers,points within spheres with several radiuses were grouped for the further feature description. Then,the multi-layer perceptron and the maximum pooling layer were used to implement the feature extraction of the point cloud,and the predicted head pose was output by the extracted features through the fully connected layer. To verify the effectiveness of HPENet,experiments were carried out on the Biwi Kinect Head Pose dataset. Experimental results show that the errors on angles of pitch,roll and yaw produced by HPENet are 2. 3,1. 5 and 2. 4 degree respectively,and the average time cost of HPENet is 8 ms per frame. Compared with other excellent algorithms,the proposed method has a better performance in terms of both accuracy and computational complexity.

Key words: head pose estimation, deep learning, Convolutional Neural Network (CNN), point cloud data

摘要: 快速、可靠的头部姿态估计算法是高级人脸分析任务的基础。为了解决现有算法存在的光照变化、遮挡、姿态尺度较大等问题,提出一种新的深度学习框架HPENet。该网络以点云数据为输入,首先通过最远点采样算法提取点云结构中的特征点,以特征点为球心,将不同半径的球体内的点构成多个分组,用于后续的特征描述;然后采用多层感知器和最大池化层实现点云的特征提取,提取的特征通过全连接层输出预测的头部姿态。为了验证HPENet的有效性,在公共数据集Biwi Kinect Head Pose上进行测试。实验结果显示,HPENet在俯仰角、侧倾角和偏航角上的误差分别为2.3°、1.5°、2.4°,平均每帧的时间消耗为8 ms。与其他优秀算法相比,所提方法在准确度和计算的复杂度方面都具有更好的性能。

关键词: 头部姿态估计, 深度学习, 卷积神经网络, 点云数据

CLC Number: