计算机应用 ›› 2020, Vol. 40 ›› Issue (4): 990-995.DOI: 10.11772/j.issn.1001-9081.2019081438

• 人工智能 • 上一篇    下一篇

基于深度可分卷积神经网络的实时人脸表情和性别分类

刘尚旺, 刘承伟, 张爱丽   

  1. 河南师范大学 计算机与信息工程学院, 河南 新乡 453007
  • 收稿日期:2019-08-19 修回日期:2019-11-01 出版日期:2020-04-10 发布日期:2019-11-25
  • 通讯作者: 刘尚旺
  • 作者简介:刘尚旺(1973-),男,河南新乡人,副教授,博士,CCF会员,主要研究方向:生物图像处理、计算机视觉;刘承伟(1996-),男,河南信阳人,硕士研究生,主要研究方向:计算机视觉、深度学习;张爱丽(1966-),女,河南滑县人,教授,主要研究方向:信号处理、通信与网络。
  • 基金资助:
    河南省科技攻关项目(192102210290);河南省高等学校重点科研项目基础研究计划(18A510014)。

Real-time facial expression and gender recognition based on depthwise separable convolutional neural network

LIU Shangwang, LIU Chengwei, ZHANG Aili   

  1. College of Computer and Information Engineering, Henan Normal University, Xinxiang Henan 453007, China
  • Received:2019-08-19 Revised:2019-11-01 Online:2020-04-10 Published:2019-11-25
  • Supported by:
    This work is partially supported by the Key Science and Technology Project of Henan Province(192102210290),the Basic Research Plan of Key Scientific Research Project of Colleges and Universities of Henan Province(18A510014).

摘要: 针对目前普通卷积神经网络(CNN)在表情和性别识别任务中出现的训练过程复杂、耗时过长、实时性差等问题,提出一种深度可分卷积神经网络的实时人脸表情和性别识别模型。首先,利用多任务级联卷积网络(MTCNN)对不同尺度输入图像进行人脸检测,并利用核相关滤波(KCF)对检测到的人脸位置进行跟踪进而提高检测速度。然后,设置不同尺度卷积核的瓶颈层,用通道合并的特征融合方式形成核卷积单元,以具有残差块和可分卷积单元的深度可分卷积神经网络提取多样化特征,并减少参数数量,轻量化模型结构;使用实时启用的反向传播可视化来揭示权重动态的变化并评估了学习的特征。最后,将表情识别和性别识别两个网络并联融合,实现表情和性别的实时识别。实验结果表明,所提出的网络模型在FER-2013数据集上取得73.8%的识别率,在CK+数据集上的识别率达到96%,在IMDB数据集中性别分类的准确率达到96%;模型的整体处理帧率达到80 frame/s,与结合支持向量机的全连接卷积神经网络方法所得结果相比,有着1.5倍的提升。因此针对数量、分辨率、大小等差异较大的数据集,该网络模型检测快,训练时间短,特征提取简单,具有较高的识别率和实时性。

关键词: 深度可分卷积神经网络, 面部检测, 性别分类, 情感分类, 特征提取

Abstract: Aiming at the problem of the current common Convolutional Neural Network(CNN)in the expression and gender recognition tasks,that is training process is complicated,time-consuming,and poor in real-time performance,a realtime facial expression and gender recognition model based on depthwise separable convolutional neural network was proposed. Firstly,the Multi-Task Convolutional Neural Network(MTCNN)was used to detect faces in different scale input images,and the detected face positions were tracked by Kernelized Correlation Filter(KCF)to increase the detection speed. Then,the bottleneck layers of convolution kernels of different scales were set,the kernel convolution units were formed by the feature fusion method of channel combination,the diversified features were extracted by the depthwise separable convolutional neural network with residual blocks and separable convolution units,and the number of parameters was reduced to lightweight the model structure. Besides,real-time enabled backpropagation visualization was used to reveal the dynamic changes of the weights and characteristics of learning. Finally,the two networks of expression recognition and gender recognition were combined in parallel to realize real-time recognition of expression and gender. Experimental results show that the proposed network model has a recognition rate of 73. 8% on the FER-2013 dataset,a recognition rate of 96% on the CK+ dataset,the accuracy of gender classification on the IMDB dataset reaches 96%;and this model has the overall processing speed reached 70 frames per second,which is improved by 1. 5 times compared with the method of common convolutional neural network combined with support vector machine. Therefore,for datasets with large differences in quantity,resolution and size,the proposed network model has fast detection,short training time,simple feature extraction, and high recognition rate and real-time performance.

Key words: depthwise separable convolutional neural network, face detection, gender recognition, facial expression recognition, feature extraction

中图分类号: