计算机应用 ›› 2021, Vol. 41 ›› Issue (6): 1667-1672.DOI: 10.11772/j.issn.1001-9081.2020091427

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

优化LeNet-5网络的多角度头部姿态估计方法

章惠1, 张娜娜2, 黄俊1   

  1. 1. 上海海洋大学 信息学院, 上海 201306;
    2. 上海建桥学院 信息技术学院, 上海 201306
  • 收稿日期:2020-09-15 修回日期:2020-12-03 出版日期:2021-06-10 发布日期:2020-12-14
  • 通讯作者: 张娜娜
  • 作者简介:章惠(1996-),女,浙江温州人,硕士研究生,主要研究方向:图像处理、计算机视觉;张娜娜(1979-),女,山东莱阳人,副教授,硕士,主要研究方向:图像处理、计算机视觉、机器学习;黄俊(1996-),男,浙江温州人,硕士研究生,主要研究方向:图像处理、计算机视觉。
  • 基金资助:
    上海市教育委员会“晨光计划”基金资助项目(AASH1702)。

Multi-angle head pose estimation method based on optimized LeNet-5 network

ZHANG Hui1, ZHANG Nana2, HUANG Jun1   

  1. 1. College of Information Technology, Shanghai Ocean University, Shanghai 201306, China;
    2. College of Information Technology, Shanghai Jian Qiao University, Shanghai 201306, China
  • Received:2020-09-15 Revised:2020-12-03 Online:2021-06-10 Published:2020-12-14
  • Supported by:
    This work is partially supported by the Shanghai Municipal Education Commission's "Morning Plan" Fund (AASH1702).

摘要: 针对在受到部分遮挡或角度过大无法定位面部关键特征点的情况下,传统的头部姿态估计方法的准确率低或无法进行头部姿态估计的问题,提出了优化LeNet-5网络的多角度头部姿态估计方法。首先,通过对卷积神经网络(CNN)的深度、卷积核大小等进行优化来更好地捕捉图像的全局特征;然后,改进池化层,用卷积操作代替池化操作来增强网络的非线性能力;最后,引入AdaBound优化器,并利用Softmax回归模型做姿态分类训练。训练中在自建数据集中增加遮挡头发、做出夸张表情和佩戴眼镜等动作来增强网络的泛化能力。实验结果表明,所提方法不需要定位面部关键特征点,在光照阴影、头发等遮挡情况下也可以实现抬头、低头、偏头等多角度转动下的头部姿态估计,在Pointing04公共数据集和CAS-PEAL-R1公共数据集上准确率达到了98.7%,运行速度平均在每秒22~29帧。

关键词: 头部姿态估计, 面部关键特征点, LeNet-5网络, 卷积神经网络, 姿态分类

Abstract: In order to solve the problems that the accuracy is low or the head pose estimation cannot be performed by traditional head pose estimation methods when the key feature points of the face cannot be located due to partial occlusion or too large angle, a multi-angle head pose estimation method based on optimized LeNet-5 network was proposed. Firstly, the depth, the size of the convolution kernel and other parameters of the Convolutional Neural Network (CNN) were optimized to better capture the global features of the image. Then, the pooling layers were improved, and a convolutional operation was used to replace the pooling operation to increase the nonlinear ability of the network. Finally, the AdaBound optimizer was introduced, and the Softmax regression model was used to perform the pose classification training. During the training, hair occlusion, exaggerated expressions and wearing glasses were added to the self-built dataset to increase the generalization ability of the network. Experimental results show that, the proposed method can realize the head pose estimation under multi-angle rotations, such as head up, head down and head tilting without locating key facial feature points, under the occlusion of light, shadow and hair, with the accuracy of 98.7% on Pointing04 public dataset and CAS-PEAL-R1 public dataset, and the average running speed of 22-29 frames per second.

Key words: head pose estimation, key facial feature point, LeNet-5 network, Convolutional Neural Network (CNN), pose classification

中图分类号: