优化LeNet-5网络的多角度头部姿态估计方法

doi:10.11772/j.issn.1001-9081.2020091427

计算机应用 ›› 2021, Vol. 41 ›› Issue (6): 1667-1672.DOI: 10.11772/j.issn.1001-9081.2020091427

所属专题：人工智能

优化LeNet-5网络的多角度头部姿态估计方法

章惠¹, 张娜娜², 黄俊¹

1. 上海海洋大学信息学院, 上海 201306;
2. 上海建桥学院信息技术学院, 上海 201306

收稿日期:2020-09-15 修回日期:2020-12-03 发布日期:2020-12-14 出版日期:2021-06-10
通讯作者: 张娜娜
作者简介:章惠(1996-),女,浙江温州人,硕士研究生,主要研究方向:图像处理、计算机视觉;张娜娜(1979-),女,山东莱阳人,副教授,硕士,主要研究方向:图像处理、计算机视觉、机器学习;黄俊(1996-),男,浙江温州人,硕士研究生,主要研究方向:图像处理、计算机视觉。
基金资助:
上海市教育委员会“晨光计划”基金资助项目（AASH1702）。

Multi-angle head pose estimation method based on optimized LeNet-5 network

ZHANG Hui¹, ZHANG Nana², HUANG Jun¹

1. College of Information Technology, Shanghai Ocean University, Shanghai 201306, China;
2. College of Information Technology, Shanghai Jian Qiao University, Shanghai 201306, China

Received:2020-09-15 Revised:2020-12-03 Online:2020-12-14 Published:2021-06-10
Supported by:
This work is partially supported by the Shanghai Municipal Education Commission's "Morning Plan" Fund (AASH1702).

摘要/Abstract

摘要： 针对在受到部分遮挡或角度过大无法定位面部关键特征点的情况下，传统的头部姿态估计方法的准确率低或无法进行头部姿态估计的问题，提出了优化LeNet-5网络的多角度头部姿态估计方法。首先，通过对卷积神经网络（CNN）的深度、卷积核大小等进行优化来更好地捕捉图像的全局特征；然后，改进池化层，用卷积操作代替池化操作来增强网络的非线性能力；最后，引入AdaBound优化器，并利用Softmax回归模型做姿态分类训练。训练中在自建数据集中增加遮挡头发、做出夸张表情和佩戴眼镜等动作来增强网络的泛化能力。实验结果表明，所提方法不需要定位面部关键特征点，在光照阴影、头发等遮挡情况下也可以实现抬头、低头、偏头等多角度转动下的头部姿态估计，在Pointing04公共数据集和CAS-PEAL-R1公共数据集上准确率达到了98.7%，运行速度平均在每秒22~29帧。

关键词: 头部姿态估计, 面部关键特征点, LeNet-5网络, 卷积神经网络, 姿态分类

Abstract: In order to solve the problems that the accuracy is low or the head pose estimation cannot be performed by traditional head pose estimation methods when the key feature points of the face cannot be located due to partial occlusion or too large angle, a multi-angle head pose estimation method based on optimized LeNet-5 network was proposed. Firstly, the depth, the size of the convolution kernel and other parameters of the Convolutional Neural Network (CNN) were optimized to better capture the global features of the image. Then, the pooling layers were improved, and a convolutional operation was used to replace the pooling operation to increase the nonlinear ability of the network. Finally, the AdaBound optimizer was introduced, and the Softmax regression model was used to perform the pose classification training. During the training, hair occlusion, exaggerated expressions and wearing glasses were added to the self-built dataset to increase the generalization ability of the network. Experimental results show that, the proposed method can realize the head pose estimation under multi-angle rotations, such as head up, head down and head tilting without locating key facial feature points, under the occlusion of light, shadow and hair, with the accuracy of 98.7% on Pointing04 public dataset and CAS-PEAL-R1 public dataset, and the average running speed of 22-29 frames per second.

Key words: head pose estimation, key facial feature point, LeNet-5 network, Convolutional Neural Network (CNN), pose classification

中图分类号:

TP391.41

章惠, 张娜娜, 黄俊. 优化LeNet-5网络的多角度头部姿态估计方法[J]. 计算机应用, 2021, 41(6): 1667-1672.

ZHANG Hui, ZHANG Nana, HUANG Jun. Multi-angle head pose estimation method based on optimized LeNet-5 network[J]. Journal of Computer Applications, 2021, 41(6): 1667-1672.

参考文献

[1] AHN B, CHOI D G, PARK J, et al. Real-time head pose estimation using multi-task deep neural network[J]. Robotics and Autonomous Systems,2018,103:1-12.
[2] 张进, 张娜娜. 优化特征提取的互动式人脸活体检测研究[J]. 计算机工程与应用, 2019, 55(13):193-200.(ZHANG J,ZHANG N N. Research on interactive face detection based on optimized feature extraction[J]. Computer Engineering and Applications, 2019,55(13):193-200.)
[3] PATACCHIOLA M,CANGELOSI A. Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods[J]. Pattern Recognition,2017,71:132-143.
[4] MANJANI I,TARIYAL S,VATSA M,et al. Detecting silicone mask-based presentation attack via deep dictionary learning[J]. IEEE Transactions on Information Forensics and Security,2017,12(7):1713-1723.
[5] 桑高丽, 陈虎, 赵启军. 一种基于深度卷积网络的鲁棒头部姿态估计方法[J]. 四川大学学报(工程科学版), 2016, 48(S1):163-169. (SANG G L,CHEN H,ZHANG Q J. Robust head pose estimation based on deep convolution neural networks[J]. Journal of Sichuan University(Engineering Science Edition),2016,48(S1):163-169.)
[6] CHUN J,KIM W. 3D face pose estimation by a robust real time tracking of facial features[J]. Multimedia Tools and Applications, 2016,75(23):15693-15708.
[7] 闵秋莎, 刘能, 陈雅婷, 等. 基于面部特征点定位的头部姿态估计[J]. 计算机工程, 2018, 44(6):263-269.(MIN Q S,LIU N, CHEN Y T,et al. Head pose estimation based on facial feature point localization[J]. Computer Engineering,2018,44(6):263-269.)
[8] RANJAN R,PATEL V M,CHELLAPPA R. HyperFace:a deep multi-task learning framework for face detection, landmark localization,pose estimation,and gender recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019, 41(1):121-135.
[9] 贺飞翔, 赵启军. 基于深度学习的头部姿态估计[J]. 计算机技术与发展, 2016, 26(11):1-4.(HE F X,ZHAO Q J. Head pose estimation based on deep learning[J]. Computer Technology and Development,2016,26(11):1-4.)
[10] 李成龙, 钟凡, 马昕, 等. 基于卡尔曼滤波和随机回归森林的实时头部姿态估计[J]. 计算机辅助设计与图形学学报, 2017, 29(12):2309-2316.(LI C L,ZHONG F,MA X,et al. Real-time head pose estimation based on Kalman filter and random regression forest[J]. Journal of Computer-Aided Design and Computer Graphics,2017,29(12):2309-2316.)
[11] YANG T Y,CHEN Y T,LIN Y Y,et al. FSA-Net:learning finegrained structure aggregation for head pose estimation from a single image[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:1087-1096.
[12] LECUN Y,BOSER B,DENKER J S,et al. Handwritten digit recognition with a back-propagation network[C]//Proceedings of the 19892nd International Conference on Neural Information Processing Systems. Cambridge:MIT Press,1989:396-404.
[13] GENG X,XIA Y. Head pose estimation based on multivariate label distribution[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:1837-1842.
[14] 冯浪, 张玲, 张晓龙. 基于扩张卷积的图像修复[J]. 计算机应用, 2020, 40(3):825-831.(FENG L,ZHANG L,ZHANG X L. Image inpainting based on dilated convolution[J]. Journal of Computer Applications,2020,40(3):825-831.)
[15] SCHROFF F,KALENICHENKO D,PHILBIN J. FaceNet:a unified embedding for face recognition and clustering[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:815-823.
[16] 许凡, 程华, 房一泉. 基于CLSTM的步态分类方法[J]. 华东理工大学学报(自然科学版), 2017, 43(4):553-558.(XU F, CHENG H,FANG Y Q. A gait pattern classification method based on CLSTM[J]. Journal of East China University of Science and Technology (Natural Science Edition), 2017, 43(4):553-558.)
[17] LATHUILIÈRE S,JUGE R,MESEJO P,et al. Deep mixture of linear inverse regressions applied to head-pose estimation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:7149-7157.
[18] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-10-11]. https://arxiv.org/pdf/1409.1556.pdf.
[19] 朱雯文, 叶西宁. 基于卷积神经网络的手势识别算法[J]. 华东理工大学学报(自然科学版), 2018, 44(2):260-269.(ZHU W W,YE X N. Convolution neural networks for gesture recognition[J]. Journal of East China University of Science and Technology (Natural Science Edition),2018,44(2):260-269.)
[20] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research,2014,15:1929-1958.
[21] ZHANG K,ZHANG Z,LI Z,et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters,2016,23(10):1499-1503.
[22] LUO L,XIONG Y,LIU Y,et al. Adaptive gradient methods with dynamic bound of learning rate[EB/OL].[2020-10-11]. https://arxiv.org/pdf/1902.09843.pdf.
[23] 陆清正, 周宇, 叶庆卫, 等. 基于局部二值特征和BP神经网络的头部姿态估计[J]. 传感器与微系统, 2019, 38(2):45-48,58. (LU Q Z,ZHOU Y,YE Q W,et al. Head pose estimation based on local binary features and BP neural network[J]. Transducer and Microsystem Technologies,2019,38(2):45-48,58.)
[24] GUPTA A,THAKKAR K,GANDHI V,et al. Nose,eyes and ears:head pose estimation by locating facial keypoints[C]//Proceedings of the 2019 IEEE Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE,2019:1977-1981.

优化LeNet-5网络的多角度头部姿态估计方法

Multi-angle head pose estimation method based on optimized LeNet-5 network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[2]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[3]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[4]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[5]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[6]	吴筝, 程志友, 汪真天, 汪传建, 王胜, 许辉. 基于深度学习的患者麻醉复苏过程中的头部运动幅度分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2258-2263.
[7]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[8]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[9]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[10]	李健京, 李贯峰, 秦飞舟, 李卫军. 基于不确定知识图谱嵌入的多关系近似推理模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1751-1759.
[11]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.
[12]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[13]	席治远, 唐超, 童安炀, 王文剑. 基于双路时空网络的驾驶员行为识别[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1511-1519.
[14]	孙敏, 成倩, 丁希宁. 基于CBAM-CGRU-SVM的Android恶意软件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1539-1545.
[15]	高文烁, 陈晓云. 基于节点结构的点云分类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1471-1478.