计算机应用 ›› 2019, Vol. 39 ›› Issue (9): 2568-2574.DOI: 10.11772/j.issn.1001-9081.2019030540

• 人工智能 • 上一篇    下一篇

基于多尺度核特征卷积神经网络的实时人脸表情识别

李旻择1, 李小霞1,2, 王学渊1,2, 孙维1   

  1. 1. 西南科技大学 信息工程学院, 四川 绵阳 621010;
    2. 特殊环境机器人技术四川省重点实验室(西南科技大学), 四川 绵阳 621010
  • 收稿日期:2019-04-03 修回日期:2019-06-07 出版日期:2019-09-10 发布日期:2019-06-10
  • 通讯作者: 李小霞
  • 作者简介:李旻择(1992-),男,四川南充人,硕士研究生,CCF会员,主要研究方向:深度学习、计算机视觉;李小霞(1976-),女,四川安岳人,教授,博士,主要研究方向:模式识别、计算机视觉;王学渊(1974-),男,四川绵阳人,副教授,博士,主要研究方向:图像处理;孙维(1995-),男,四川达州人,硕士研究生,主要研究方向:图像处理。
  • 基金资助:

    国家自然科学基金资助项目(61771411);四川省科技计划项目(2019YJ0449);西南科技大学研究生创新基金资助项目(18ycx123)。

Real-time facial expression recognition based on convolutional neural network with multi-scale kernel feature

LI Minze<sup>1</sup>, LI Xiaoxia<sup>1,2</sup>, WANG Xueyuan<sup>1,2</sup>, SUN Wei<sup>1</sup>   

  1. 1. School of Information Engineering, Southwest University of Science and Technology, Mianyang Sichuan 621010, China;
    2. Key Laboratory of Special Environmental Robotics in Sichuan Province(Southwest University of Science and Technology), Mianyang Sichuan 621010, China
  • Received:2019-04-03 Revised:2019-06-07 Online:2019-09-10 Published:2019-06-10
  • Supported by:

    This work is partially supported by the National Natural Science Foundation of China (61771411), the Sichuan Science and Technology Project (2019YJ0449), the Graduate Innovation Fund of Southwest University of Science and Technology (18ycx123).

摘要:

针对人脸表情识别的泛化能力不足、稳定性差以及速度慢难以满足实时性要求的问题,提出了一种基于多尺度核特征卷积神经网络的实时人脸表情识别方法。首先,提出改进的MobileNet结合单发多盒检测器(MSSD)轻量化人脸检测网络,并利用核相关滤波(KCF)模型对检测到的人脸坐标信息进行跟踪来提高检测速度和稳定性;然后,使用三种不同尺度卷积核的线性瓶颈层构成三条支路,用通道合并的特征融合方式形成多尺度核卷积单元,利用其多样性特征来提高表情识别的精度;最后,为了提升模型泛化能力和防止过拟合,采用不同的线性变换方式进行数据增强来扩充数据集,并将FER-2013人脸表情数据集上训练得到的模型迁移到小样本CK+数据集上进行再训练。实验结果表明,所提方法在FER-2013数据集上的识别率达到73.0%,较Kaggle表情识别挑战赛冠军提高了1.8%,在CK+数据集上的识别率高达99.5%。对于640×480的视频,人脸检测速度达到每秒158帧,是主流人脸检测网络多任务级联卷积神经网络(MTCNN)的6.3倍,同时人脸检测和表情识别整体速度达到每秒78帧。因此所提方法能够实现快速精确的人脸表情识别。

关键词: 人脸表情识别, 卷积神经网络, 人脸检测, 核相关滤波, 迁移学习

Abstract:

Aiming at the problems of insufficient generalization ability, poor stability and difficulty in meeting the real-time requirement of facial expression recognition, a real-time facial expression recognition method based on multi-scale kernel feature convolutional neural network was proposed. Firstly, an improved MSSD (MobileNet+Single Shot multiBox Detector) lightweight face detection network was proposed, and the detected face coordinates information was tracked by Kernel Correlation Filter (KCF) model to improve the detection speed and stability. Then, three linear bottlenecks of three different scale convolution kernels were used to form three branches. The multi-scale kernel convolution unit was formed by the feature fusion of channel combination, and the diversity feature was used to improve the accuracy of expression recognition. Finally, in order to improve the generalization ability of the model and prevent over-fitting, different linear transformation methods were used for data enhancement to augment the dataset, and the model trained on the FER-2013 facial expression dataset was migrated to the small sample CK+ dataset for retraining. The experimental results show that the recognition rate of the proposed method on the FER-2013 dataset reaches 73.0%, which is 1.8% higher than that of the Kaggle Expression Recognition Challenge champion, and the recognition rate of the proposed method on the CK+ dataset reaches 99.5%. For 640×480 video, the face detection speed of the proposed method reaches 158 frames per second, which is 6.3 times of that of the mainstream face detection network MTCNN (MultiTask Cascaded Convolutional Neural Network). At the same time, the overall speed of face detection and expression recognition of the proposed method reaches 78 frames per second. It can be seen that the proposed method can achieve fast and accurate facial expression recognition.

Key words: Facial Expression Recognition (FER), Convolutional Neural Network (CNN), face detection, Kernel Correlation Filter (KCF), transfer learning

中图分类号: