基于多尺度核特征卷积神经网络的实时人脸表情识别

doi:10.11772/j.issn.1001-9081.2019030540

计算机应用 ›› 2019, Vol. 39 ›› Issue (9): 2568-2574.DOI: 10.11772/j.issn.1001-9081.2019030540

基于多尺度核特征卷积神经网络的实时人脸表情识别

李旻择¹, 李小霞^1,2, 王学渊^1,2, 孙维¹

1. 西南科技大学信息工程学院, 四川绵阳 621010;
2. 特殊环境机器人技术四川省重点实验室(西南科技大学), 四川绵阳 621010

收稿日期:2019-04-03 修回日期:2019-06-07 出版日期:2019-09-10 发布日期:2019-06-10
通讯作者: 李小霞
作者简介:李旻择(1992-),男,四川南充人,硕士研究生,CCF会员,主要研究方向:深度学习、计算机视觉;李小霞(1976-),女,四川安岳人,教授,博士,主要研究方向:模式识别、计算机视觉;王学渊(1974-),男,四川绵阳人,副教授,博士,主要研究方向:图像处理;孙维(1995-),男,四川达州人,硕士研究生,主要研究方向:图像处理。
基金资助:
国家自然科学基金资助项目（61771411）；四川省科技计划项目（2019YJ0449）；西南科技大学研究生创新基金资助项目（18ycx123）。

Real-time facial expression recognition based on convolutional neural network with multi-scale kernel feature

LI Minze1, LI Xiaoxia1,2, WANG Xueyuan1,2, SUN Wei1

1. School of Information Engineering, Southwest University of Science and Technology, Mianyang Sichuan 621010, China;
2. Key Laboratory of Special Environmental Robotics in Sichuan Province(Southwest University of Science and Technology), Mianyang Sichuan 621010, China

Received:2019-04-03 Revised:2019-06-07 Online:2019-09-10 Published:2019-06-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61771411), the Sichuan Science and Technology Project (2019YJ0449), the Graduate Innovation Fund of Southwest University of Science and Technology (18ycx123).

摘要/Abstract

摘要：

针对人脸表情识别的泛化能力不足、稳定性差以及速度慢难以满足实时性要求的问题，提出了一种基于多尺度核特征卷积神经网络的实时人脸表情识别方法。首先，提出改进的MobileNet结合单发多盒检测器（MSSD）轻量化人脸检测网络，并利用核相关滤波（KCF）模型对检测到的人脸坐标信息进行跟踪来提高检测速度和稳定性；然后，使用三种不同尺度卷积核的线性瓶颈层构成三条支路，用通道合并的特征融合方式形成多尺度核卷积单元，利用其多样性特征来提高表情识别的精度；最后，为了提升模型泛化能力和防止过拟合，采用不同的线性变换方式进行数据增强来扩充数据集，并将FER-2013人脸表情数据集上训练得到的模型迁移到小样本CK+数据集上进行再训练。实验结果表明，所提方法在FER-2013数据集上的识别率达到73.0%，较Kaggle表情识别挑战赛冠军提高了1.8%，在CK+数据集上的识别率高达99.5%。对于640×480的视频，人脸检测速度达到每秒158帧，是主流人脸检测网络多任务级联卷积神经网络（MTCNN）的6.3倍，同时人脸检测和表情识别整体速度达到每秒78帧。因此所提方法能够实现快速精确的人脸表情识别。

关键词: 人脸表情识别, 卷积神经网络, 人脸检测, 核相关滤波, 迁移学习

Abstract:

Aiming at the problems of insufficient generalization ability, poor stability and difficulty in meeting the real-time requirement of facial expression recognition, a real-time facial expression recognition method based on multi-scale kernel feature convolutional neural network was proposed. Firstly, an improved MSSD (MobileNet+Single Shot multiBox Detector) lightweight face detection network was proposed, and the detected face coordinates information was tracked by Kernel Correlation Filter (KCF) model to improve the detection speed and stability. Then, three linear bottlenecks of three different scale convolution kernels were used to form three branches. The multi-scale kernel convolution unit was formed by the feature fusion of channel combination, and the diversity feature was used to improve the accuracy of expression recognition. Finally, in order to improve the generalization ability of the model and prevent over-fitting, different linear transformation methods were used for data enhancement to augment the dataset, and the model trained on the FER-2013 facial expression dataset was migrated to the small sample CK+ dataset for retraining. The experimental results show that the recognition rate of the proposed method on the FER-2013 dataset reaches 73.0%, which is 1.8% higher than that of the Kaggle Expression Recognition Challenge champion, and the recognition rate of the proposed method on the CK+ dataset reaches 99.5%. For 640×480 video, the face detection speed of the proposed method reaches 158 frames per second, which is 6.3 times of that of the mainstream face detection network MTCNN (MultiTask Cascaded Convolutional Neural Network). At the same time, the overall speed of face detection and expression recognition of the proposed method reaches 78 frames per second. It can be seen that the proposed method can achieve fast and accurate facial expression recognition.

Key words: Facial Expression Recognition (FER), Convolutional Neural Network (CNN), face detection, Kernel Correlation Filter (KCF), transfer learning

中图分类号:

TP391.4

李旻择, 李小霞, 王学渊, 孙维. 基于多尺度核特征卷积神经网络的实时人脸表情识别[J]. 计算机应用, 2019, 39(9): 2568-2574.

LI Minze, LI Xiaoxia, WANG Xueyuan, SUN Wei. Real-time facial expression recognition based on convolutional neural network with multi-scale kernel feature[J]. Journal of Computer Applications, 2019, 39(9): 2568-2574.

参考文献

[1] EKMAN P. Contacts across cultures in the face and emotion[J]. Journal of Personality and Social Psychology, 1971, 17(2):124-129.
[2] ZHAO X, ZHANG S. Facial expression recognition based on local binary patterns and kernel discriminant isomap[J]. Sensors, 2011, 11(10):9573-9588.
[3] KUMAR P, HAPPY S L, ROUTRAY A. A real-time robust facial expression recognition system using HOG features[C]//CAST 2016:Proceedings of the 2016 International Conference on Computing, Analytics and Security Trends. Piscataway, NJ:IEEE, 2016:289-293.
[4] 刘帅师,田彦涛,万川.基于Gabor多方向特征融合与分块直方图的人脸表情识别方法[J]. 自动化学报,2011,37(12):1455-1463.(LIU S S, TIAN Y T, WAN C. Facial expression recognition method based on gabor multi-orientation features fusion and block histogram[J]. Acta Automatica Sinica, 2011, 37(12):1455-1463.)
[5] BERRETTI S, del BIMBO A, PALA P, et al. A set of selected SIFT features for 3D facial expression recognition[C]//ICPR 2010:Proceedings of the 2010 20th International Conference on Pattern Recognition. Piscataway, NJ:IEEE, 2010:4125-4128.
[6] CHEON Y, KIM D. Natural facial expression recognition using differential-AAM and manifold learning[J]. Pattern Recognition, 2009, 42(7):1340-1350.
[7] 尹星云,王洵,董兰芳,等.用隐马尔可夫模型设计人脸表情识别系统[J].电子科技大学学报,2003, 32(6):725-728.(YIN X Y, WANG X, DONG L F, et al. Design of recognition for facial expression by hidden markov model[J]. Journal of University of Electronic Science and Technology of China, 2003, 32(6):725-728.)
[8] VAPNIK V N, LERNER A Y. Recognition of patterns with help of generalized portraits[J]. Avtomatika I Telemekhanika, 1963, 24(6):774-780.
[9] ROWEIS S T. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500):2323-2326.
[10] HART P E. The condensed nearest neighbor rule[J]. IEEE Transactions on Information Theory, 1968, 14(3):515-516.
[11] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//NIPS'12:Proceedings of the 25th International Conference on Neural Information Processing Systems. North Miami Beach, FL, USA:Curran Associates, 2012:1097-1105.
[12] LYONS M J, AKAMATSU S, KAMACHI M G, et al. Coding facial expressions with Gabor wavelets[C]//AFGR 1998:Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. Piscataway, NJ:IEEE, 1998:200-205.
[13] LUCEY P, COHN J F, KANADE T, et al. The extended Cohn-Kanade dataset (CK+):a complete dataset for action unit and emotion-specified expression[C]//CVPRW 2010:Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2010:94-101.
[14] GOODFELLOW I J, ERHAN D, CARRIER P L, et al. Challenges in representation learning:a report on three machine learning contests[J]. Neural Networks, 2013, 64:59-63.
[15] DHALL A, GOECKE R, LUCEY S, et al. Static facial expression analysis in tough conditions:data, evaluation protocol and benchmark[C]//ICCVW 2011:Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops. Piscataway, NJ:IEEE, 2011:2106-2112.
[16] TANG Y. Deep learning using linear support vector machines[EB/OL].[2018-12-21]. https://arxiv.org/pdf/1306.0239.pdf.
[17] AL-SHABI M, CHEAH W P, CONNIE T. Facial expression recognition using a hybrid CNN-SIFT aggregator[EB/OL].[2018-08-17]. https://arxiv.org/ftp/arxiv/papers/1608/1608.02833.pdf.
[18] FANG H, PARTHALÁIN N M, AUBREY A J, et al. Facial expression recognition in dynamic sequences:an integrated approach[J]. Pattern Recognition, 2014, 47(3):1271-1281.
[19] JEON J, PARK J-C, JO Y J, et al. A real-time facial expression recognizer using deep neural network[C]//IMCOM'16:Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication. New York:ACM, 2016:Article No. 94.
[20] NEHAL O, NOHA A, FAYEZ W. Intelligent real-time facial expression recognition from video sequences based on hybrid feature tracking algorithms[J]. International Journal of Advanced Computer Science and Applications, 2017, 8(1):245-260.
[21] LIU W, ANGUELOV D, ERHAN D, et al. SSD:single shot multibox detector[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Berlin:Springer, 2016:21-37.
[22] HENRIQUES J F, CASEIRO R, MARTINS, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3):583-596.
[23] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-01-10]. https://arxiv.org/pdf/1409.1556.pdf.
[24] HOWARD A G, ZHU M, CHEN B. et al. MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].[2018-12-17]. https://arxiv.org/pdf/1704.04861.pdf.
[25] SANDLER M, HOWARD A, ZHU M, et al. Inverted residuals and linear bottlenecks:mobile networks for classification, detection and segmentation[EB/OL].[2018-12-16]. https://arxiv.org/pdf/1801.04381v2.pdf.
[26] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[EB/OL].[2018-12-06]. https://arxiv.org/pdf/1502.01852.pdf.
[27] JARRETT K, KAVUKCUOGLU K, RANZATO M, et al. What is the best multi-stage architecture for object recognition?[C]//ICCV 2009:Proceedings of the IEEE 12th International Conference on Computer Vision. Piscataway, NJ:IEEE, 2009:2146-2153.
[28] LIEW S S, KHALIL-HANI M, BAKHTERI R. Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems[J]. Neurocomputing, 2016, 216(C):718-734.
[29] DJORK-ARNÉ C, UNTERTHINER T, HOCHREITER S. Fast and accurate deep network learning by Exponential Linear Units (ELUs)[EB/OL].[2019-01-22]. https://arxiv.org/pdf/1511.07289.pdf.
[30] YANG S, LUO P, LOY C C, et al. WIDER FACE:a face detection benchmark[C]//CVPR 2016:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2016:5525-5533.
[31] DENG J, DONG W, SOCHER R, et al. ImageNet:a large-scale hierarchical image database[C]//CVPR 2009:Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2009:248-255.
[32] YANG S, LUO P, LOY C C, et al. From facial parts responses to face detection:a deep learning approach[C]//ICCV 2015:Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2015:3676-3684.
[33] ZHANG K, ZHANG Z, LI Z, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10):1499-1503.
[34] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning[C]//AAAI 2017:Proceedings of the 31st AAAI Conference on Artificial Intelligence. Menlo Park, CA:AAAI Press, 2017:23-38.
[35] GUO Y, TAO D, YU J, et al. Deep neural networks with relativity learning for facial expression recognition[C]//ICMEW 2016:Proceedings of the 2016 IEEE International Conference on Multimedia and Expo Workshops. Piscataway, NJ:IEEE, 2016:1-6.
[36] YAN J, ZHENG W, CUI Z, et al. A joint convolutional bidirectional LSTM framework for facial expression recognition[J]. IEICE Transactions on Information and Systems, 2018, 101(4):1217-1220.
[37] FERNANDEZ P D M, PEÑA F A G, REN T I, et al. FERAtt:facial expression recognition with attention net[EB/OL].[2019-02-08]. https://arxiv.org/pdf/1902.03284.pdf.
[38] SONG X, BAO H. Facial expression recognition based on video[C]//AIPR 2017:Proceedings of the 2016 IEEE Applied Imagery Pattern Recognition Workshop. Washington, DC:IEEE Computer Society, 2016, 1:1-5.
[39] ZHANG K, HUANG Y, DU Y, et al. Facial expression recognition based on deep evolutional spatial-temporal networks[J]. IEEE Transactions on Image Processing, 2017, 26(9):4193-4203.

基于多尺度核特征卷积神经网络的实时人脸表情识别

Real-time facial expression recognition based on convolutional neural network with multi-scale kernel feature

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[2]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[3]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[4]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[5]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[6]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[7]	牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.
[8]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[9]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[10]	黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.
[11]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[12]	谭道强, 曾诚, 乔金霞, 张俊. 基于混合注意力模型的阴影检测方法[J]. 计算机应用, 2021, 41(7): 2076-2081.
[13]	武光利, 李雷霆, 郭振洲, 王成祥. 基于改进的双向长短期记忆网络的视频摘要生成模型[J]. 计算机应用, 2021, 41(7): 1908-1914.
[14]	高钦泉, 黄炳城, 刘文哲, 童同. 基于改进CenterNet的竹条表面缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1933-1938.
[15]	吴则举, 焦翠娟, 陈亮. 基于改进Faster R-CNN的轮胎缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1939-1946.