《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (3): 750-756.DOI: 10.11772/j.issn.1001-9081.2021040807

• 2021年中国计算机学会人工智能会议(CCFAI 2021) • 上一篇    

基于生成对抗网络和网络集成的面部表情识别方法EE-GAN

杨鼎康1,2,3, 黄帅1,2,3, 王顺利1,2,3, 翟鹏1,2,3, 李一丹1,2,3, 张立华1,2,3,4,5()   

  1. 1.复旦大学 工程与应用技术研究院, 上海 200433
    2.上海智能机器人工程技术研究中心, 上海 200433
    3.智能机器人教育部工程研究中心, 上海 200433
    4.季华实验室, 广东 佛山 528200
    5.吉林省人工智能与无人系统工程研究中心, 长春 130000
  • 收稿日期:2021-05-18 修回日期:2021-07-06 接受日期:2021-07-09 发布日期:2021-11-09 出版日期:2022-03-10
  • 通讯作者: 张立华
  • 作者简介:杨鼎康(1996—),男,陕西城固人,博士研究生,主要研究方向:计算机视觉、多模态情绪识别、情感计算
    黄帅(1998—),男,安徽阜阳人,硕士研究生,主要研究方向:行为识别、情绪识别
    王顺利(1998—),男,河北石家庄人,博士研究生,主要研究方向:人体行为分析、行为质量评估
    翟鹏(1992—),男,山西阳泉人,博士研究生,主要研究方向:人工智能、强化学习
    李一丹(1998—),女,山西原平人,硕士研究生,主要研究方向:图像处理、计算成像;
  • 基金资助:
    国家自然科学基金资助项目(82090052);上海市科技重大项目(2021SHZDZX0103)

EE-GAN:facial expression recognition method based on generative adversarial network and network integration

Dingkang YANG1,2,3, Shuai HUANG1,2,3, Shunli WANG1,2,3, Peng ZHAI1,2,3, Yidan LI1,2,3, Lihua ZHANG1,2,3,4,5()   

  1. 1.Academy for Engineering & Technology,Fudan University,Shanghai 200433,China
    2.Shanghai Engineering Research Center of AI & Robotics,Shanghai 200433,China
    3.Engineering Research Center of AI & Robotics,Ministry of Education,Shanghai 200433,China
    4.Ji Hua Laboratory,Foshan Guangdong 528000,China
    5.Artificial Intelligence and Unmanned Systems Engineering Research Center of Jilin Province,Changchun Jilin 130000,China
  • Received:2021-05-18 Revised:2021-07-06 Accepted:2021-07-09 Online:2021-11-09 Published:2022-03-10
  • Contact: Lihua ZHANG
  • About author:YANG Dingkang, born in 1996, Ph. D. candidate. His research interests include computer vision, multimodal emotion recognition, affective computing.
    HUANG Shuai, born in 1998, M. S. candidate. His research interests include behavior recognition, emotion recognition.
    WANG Shunli, born in 1998, Ph. D. candidate. His research interests include human action analysis, action quality assessment.
    ZHAI Peng, born in 1992, Ph. D. candidate. His research interests include artificial intelligence, reinforcement learning.
    LI Yidan, born in 1998, M. S. candidate. Her research interests include image processing, computational imaging.
  • Supported by:
    National Natural Science Foundation of China(82090052);Shanghai Municipal Science and Technology Major Project(2021SHZDZX0103)

摘要:

由于现实生活场景差异大,人类在不同场景中表现的情感也不尽相同,导致获取到的情感数据集标签分布不均衡;同时传统方法多采用模型预训练和特征工程来增强与表情相关特征的表示能力,但没有考虑不同特征表达之间的互补性,限制了模型的泛化性和鲁棒性。针对上述问题,提出了一种包含网络集成模型Ens-Net的端到端深度学习框架EE-GAN:一方面考虑了多个异质网络获得的不同深度和区域的特征,实现不同语义、不同层次的特征融合,并通过网络集成以提高模型的学习能力;另一方面,基于对抗生成网络生成具有特定表情标签的面部图像,在进行数据增强的同时,达到平衡表情标签数据分布的目的。在CK+、FER2013和JAFFE数据集上的定性和定量实验验证了所提方法的有效性:相较于局部保留投影方法(LPP)在内的基于视图学习的方法,EE-GAN面部表情识别的准确率最高,分别达到了82.1%、84.8%和91.5%;同时,和AlexNet、VGG、ResNet等传统卷积神经网络(CNN)模型相比,准确率最少提高了9个百分点。

关键词: 面部表情识别, 生成对抗网络, 网络集成, 不均衡标签分布, 特征融合

Abstract:

Because there are many differences in real life scenes, human emotions are various in different scenes, which leads to an uneven distribution of labels in the emotion dataset. Furthermore, most traditional methods utilize model pre-training and feature engineering to enhance the expression ability of expression-related features, but do not consider the complementarity between different feature representations, which limits the generalization and robustness of the model. To address these issues, EE-GAN, an end-to-end deep learning framework including the network integration model Ens-Net was proposed. It took the characteristics of different depths and regions into consideration,the fusion of different semantic and different level features was implemented, and network integration was used to improve the learning ability of the model. Besides, facial images with specific expression labels were generated by generative adversarial network, which aimed to balance the distribution of expression labels in data augmentation. The qualitative and quantitative evaluations on CK+, FER2013 and JAFFE datasets demonstrate the effectiveness of proposed method. Compared with existing view learning methods, including Locality Preserving Projections (LPP), EE-GAN achieves the facial expression accuracies of 82.1%, 84.8% and 91.5% on the three datasets respectively. Compared with traditional CNN models such as AlexNet, VGG, and ResNet, EE-GAN achieves the accuracy increased by at least 9 percentage points.

Key words: facial expression recognition, Generative Adversarial Network (GAN), network integration, uneven label distribution, feature fusion

中图分类号: