Facial attribute estimation and expression recognition based on contextual channel attention mechanism

doi:10.11772/j.issn.1001-9081.2024010098

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (1): 253-260.DOI: 10.11772/j.issn.1001-9081.2024010098

• Multimedia computing and computer simulation • Previous Articles Next Articles

Facial attribute estimation and expression recognition based on contextual channel attention mechanism

Jie XU¹, Yong ZHONG², Yang WANG³, Changfu ZHANG⁴, Guanci YANG¹^,³()

^1.Key Laboratory of Advanced Manufacturing Technology of the Ministry of Education （Guizhou University），Guiyang Guizhou 550025，China
^2.Chengdu Institute of Computer Application，Chinese Academy of Sciences，Chengdu Sichuan 610213，China
^3.State Key Laboratory of Public Big Data （Guizhou University），Guiyang Guizhou 550025，China
^4.School of Mechanical Engineering，Guizhou University，Guiyang Guizhou 550025，China

Received:2024-01-26 Revised:2024-03-28 Accepted:2024-04-01 Online:2024-05-09 Published:2025-01-10
Contact: Guanci YANG
About author:XU Jie， born in 1997， M. S. candidate. His research interests include intelligent autonomous system.
ZHONG Yong， born in 1966， Ph. D.， research fellow. His research interests include big data and its intelligent processing， cloud computing， software engineering.
WANG Yang， born in 1987， Ph. D.， senior engineer. His research interests include artificial intelligence， computer vision， intelligent analysis of big data.
ZHANG Changfu， born in 1990， senior engineer. His research interests include industrial big data， artificial intelligence.
Supported by:
National Natural Science Foundation of China(62373116);Guizhou Province Science and Technology Program （Qiankehe Zhicheng ［2023］ Yiban 118, Qiankehe Pingtairencai ［2020］6007-2）

基于上下文通道注意力机制的人脸属性估计与表情识别

徐杰¹, 钟勇², 王阳³, 张昌福⁴, 杨观赐¹^,³()

^1.现代制造技术教育部重点实验室（贵州大学），贵阳 550025
^2.中国科学院成都计算机应用研究所，成都 610213
^3.省部共建公共大数据国家重点实验室（贵州大学），贵阳 550025
^4.贵州大学机械工程学院，贵阳 550025

通讯作者: 杨观赐
作者简介:徐杰（1997—），男，安徽阜阳人，硕士研究生，CCF会员，主要研究方向：自主智能系统；
钟勇（1966—），男，四川岳池人，研究员，博士，主要研究方向：大数据及其智能处理、云计算、软件工程；
王阳（1987—），男，河南鹤壁人，高级工程师，博士，主要研究方向：人工智能、计算机视觉、大数据智能分析；
张昌福（1990—），男，贵州瓮安人，高级工程师，主要研究方向：工业大数据、人工智能；
基金资助:
国家自然科学基金资助项目(62373116);贵州省科技计划项目（黔科合支撑［2023］一般118,黔科合平台人才［2020］6007-2）

Abstract

Abstract:

Facial features contain a lot of information and hold significant value in facial attribute and expression analysis tasks， but the diversity and complexity of facial features make facial analysis tasks difficult. Aiming at the above issue， a model of Facial Attribute estimation and Expression Recognition based on contextual channel attention mechanism （FAER） was proposed from the perspective of fine-grained facial features. Firstly， a local feature encoding backbone network based on ConvNext was constructed， and by utilizing the effectiveness of the backbone network in encoding local features， the differences among facial local features were represented adequately. Secondly， a Contextual Channel Attention （CC Attention） mechanism was introduced. By adjusting the weight information on feature channels dynamically and adaptively， both global and local features of deep features were represented， so as to address the limitations of the backbone network ability in encoding global features. Finally， different classification strategies were designed. For Facial Attribute Estimation （FAE） and Facial Expression Recognition （FER） tasks， different combinations of loss functions were employed to encourage the model to learn more fine-grained facial features. Experimental results show that the proposed model achieves an average accuracy of 91.87% on facial attribute dataset CelebA （CelebFaces Attributes）， surpassing the suboptimal model SwinFace （Swin transformer for Face） by 0.55 percentage points， and the proposed model achieves accuracies of 91.75% and 66.66% respectively on facial expression datasets RAF-DB and AffectNet， surpassing the suboptimal model TransFER （Transformers for Facial Expression Recognition） by 0.84 and 0.43 percentage points respectively.

Key words: Facial Attribute Estimation (FAE), Facial Expression Recognition (FER), attention mechanism, fine-grained feature, feature difference

摘要：

人脸特征蕴含诸多信息，在面部属性和情感分析任务中具有重要价值，而面部特征的多样性和复杂性使人脸分析任务变得困难。针对上述难题，从面部细粒度特征角度出发，提出基于上下文通道注意力机制的人脸属性估计和表情识别（FAER）模型。首先，构建基于ConvNext的局部特征编码骨干网络，并运用骨干网络编码局部特征的有效性来充分表征人脸局部特征之间的差异性；其次，提出上下文通道注意力（CC Attention）机制，通过动态自适应调整特征通道上的权重信息，表征深度特征的全局和局部特征，从而弥补骨干网络编码全局特征能力的不足；最后，设计不同分类策略，针对人脸属性估计（FAE）和面部表情识别（FER）任务，分别采用不同损失函数组合，以促使模型学习更多的面部细粒度特征。实验结果表明，所提FAER模型在人脸属性数据集CelebA （CelebFaces Attributes）上取得了91.87%的平均准确率，相较于次优模型SwinFace （Swin transformer for Face）高出0.55个百分点；在面部表情数据集RAF-DB和AffectNet上分别取得了91.75%和66.66%的准确率，相较于次优模型TransFER （Transformers for Facial Expression Recognition）分别高出0.84和0.43个百分点。

关键词: 人脸属性估计, 面部表情识别, 注意力机制, 细粒度特征, 特征差异

CLC Number:

TP18

Jie XU, Yong ZHONG, Yang WANG, Changfu ZHANG, Guanci YANG. Facial attribute estimation and expression recognition based on contextual channel attention mechanism[J]. Journal of Computer Applications, 2025, 45(1): 253-260.

徐杰, 钟勇, 王阳, 张昌福, 杨观赐. 基于上下文通道注意力机制的人脸属性估计与表情识别[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 253-260.

Figures/Tables 11

References 46

1	张晓行，田启川，廉露，等.人脸关键点检测研究综述［J］.计算机工程与应用， 2024， 60（12）： 48-60.
	ZHANG X H， TIAN Q C， LIAN L， et al. Review of research on facial landmark detection ［J］. Computer Engineering and Applications， 2024， 60（12）： 48-60.
2	张波，兰艳亭，鲜浩，等.基于通道注意力机制的人脸表情识别机器人交互研究［J］.电子测量技术， 2021， 44（11）： 169-174.
	ZHANG B， LAN Y T， XIAN H， et al. Research on robot interaction of facial expression recognition based on channel attention mechanism ［J］. Electronic Measurement Technology， 2021， 44（11）： 169-174.
3	LIN J， LI Y， YANG G. FPGAN： face de-identification method with generative adversarial networks for social robots ［J］. Neural Networks， 2021， 133： 132-147.
4	LIU Z， LUO P， WANG X， et al. Large-scale CelebFaces attributes （CelebA） dataset ［EB/OL］. ［2023-10-22］. .
5	LIU Z， LUO P， WANG X， et al. Deep learning face attributes in the wild ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 3730-3738.
6	HAN H， JAIN A K， WANG F， et al. Heterogeneous face attribute estimation： a deep multi-task learning approach ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（11）： 2597-2609.
7	HE S， LUO H， WANG P， et al. TransReID： Transformer-based object re-identification ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 14993-15002.
8	NGUYEN H M， LY N Q， PHUNG T T T. Large-scale face image retrieval system at attribute level based on facial attribute ontology and deep neuron network ［C］// Proceedings of the 2018 Asian Conference on Intelligent Information and Database Systems， LNCS 10752. Cham： Springer， 2018： 539-549.
9	CAO J， LI Y， ZHANG Z. Partially shared multi-task convolutional neural network with local constraint for face attribute learning ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4290-4299.
10	QIN L， WANG M， DENG C， et al. SwinFace： a multi-task transformer for face recognition， expression recognition， age estimation and attribute estimation ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2024， 34（4）： 2223-2234.
11	LI W， CAO Z， FENG J， et al. Label2Label： a language modeling framework for multi-attribute learning ［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13672. Cham： Springer， 2022： 562-579.
12	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale ［EB/OL］. ［2023-10-20］. .
13	戴国庆，张晟磊，袁玉波.老龄面部数据抽取的肤色显著性方法［J］.计算机应用， 2022， 42（S2）： 217-223.
	DAI G Q， ZHANG S L， YUAN Y B. Aged facial dada extraction method by using skin color saliency ［J］. Journal of Computer Applications， 2022， 42（S2）： 217-223.
14	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
15	LIU Z， MAO H， WU C Y， et al. A ConvNet for the 2020s ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 11966-11976.
16	EKMAN P， FRIESEN W V. Constants across cultures in the face and emotion ［J］. Journal of Personality and Social Psychology， 1971， 17（2）： 124-129.
17	MOLLAHOSSEINI A， HASANI B， MAHOOR M H. AffectNet： a database for facial expression， valence， and arousal computing in the wild ［J］. IEEE Transactions on Affective Computing， 2019， 10（1）： 18-31.
18	LI S， DENG W， DU J. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2584-2593.
19	WEN Y， ZHANG K， LI Z， et al. A discriminative feature learning approach for deep face recognition ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9911. Cham： Springer， 2016： 499-515.
20	WAN W， ZHONG Y， LI T， et al. Rethinking feature distribution for loss functions in image classification ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 9117-9126.
21	FARZANEH A H， QI X. Facial expression recognition in the wild via deep attentive center loss ［C］// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021： 2401-2410.
22	ZHANG Y， WANG C， LING X， et al. Learn from all： erasing attention consistency for noisy label facial expression recognition ［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13686. Cham： Springer， 2022： 418-434.
23	刘希未，宫晓燕，赵红霞，等.基于混合注意力机制的动态人脸表情识别［J］.计算机应用， 2023， 43（S1）： 1-7.
	LIU X W， GONG X Y， ZHAO H X， et al. Dynamic facial expression recognition based on hybrid attention mechanism ［J］. Journal of Computer Applications， 2023， 43（S1）： 1-7.
24	FERNANDEZ P D M， PEÑA F A G， REN T I， et al. FERAtt： facial expression recognition with attention net ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2019： 837-846.
25	WANG K， PENG X， YANG J， et al. Region attention networks for pose and occlusion robust facial expression recognition ［J］. IEEE Transactions on Image Processing， 2020， 29： 4057-4069.
26	MA F， SUN B， LI S. Facial expression recognition with visual transformers and attentional selective fusion ［J］. IEEE Transactions on Affective Computing， 2023， 14（2）： 1236-1248.
27	XUE F， WANG Q， GUO G. TransFER： learning relation-aware facial expression representations with Transformers ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 3581-3590.
28	VO T H， LEE G S， YANG H J， et al. Pyramid with super resolution for in-the-wild facial expression recognition ［J］. IEEE Access， 2020， 8： 131988-132001.
29	RUDD E M， GÜNTHER M， BOULT T E. MOON： a mixed objective optimization network for the recognition of facial attributes ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9909. Cham： Springer， 2016： 19-35.
30	HAND E M， CHELLAPPA R. Attributes for improved attributes： a multi-task network utilizing implicit and explicit relationships for facial attribute classification ［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2017： 4068-4074.
31	ZHUANG N， YAN Y， CHEN S， et al. Multi-task learning of cascaded CNN for facial attribute classification ［C］// Proceedings of the 24th International Conference on Pattern Recognition. Piscataway： IEEE， 2018： 2069-2074.
32	MAO L， YAN Y， XUE J H， et al. Deep multi-task multi-label CNN for effective facial attribute classification ［J］. IEEE Transactions on Affective Computing， 2022， 13（2）： 818-828.
33	SAVCHENKO A V. Facial expression and attributes recognition based on multi-task learning of lightweight neural networks ［C］// Proceedings of the IEEE 19th International Symposium on Intelligent Systems and Informatics. Piscataway： IEEE， 2021： 119-124.
34	ZHAO Z， LIU Q， ZHOU F. Robust lightweight facial expression recognition network with label distribution training ［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 3510-3519.
35	ZHAO Z， LIU Q， WANG S. Learning deep global multi-scale and local attention features for facial expression recognition in the wild ［J］. IEEE Transactions on Image Processing， 2021， 30： 6544-6556.
36	WEN Z， LIN W， WANG T， et al. Distract your attention： multi-head cross attention network for facial expression recognition ［J］. Biomimetics， 2023， 8（2）： No.199.
37	LIU H， CAI H， LIN Q， et al. Adaptive multilayer perceptual attention network for facial expression recognition ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（9）： 6253-6266.
38	SHI J， ZHU S， LIANG Z. Learning to amend facial expression representation via de-albino and affinity ［EB/OL］. ［2023-10-22］. .
39	HU J， SHEN L， SUN G. Squeeze-and-excitation networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141.
40	LI Y， YAO T， PAN Y， et al. Contextual Transformer networks for visual recognition ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（2）： 1489-1500.
41	LIN T， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007.
42	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
43	GUO Y， ZHANG L， HU Y， et al. MS-Celeb-1M： a dataset and benchmark for large-scale face recognition ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9907. Cham： Springer， 2016： 87-102.
44	RIDNIK T， BEN-BARUCH E， ZAMIR N， et al. Asymmetric loss for multi-label classification ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 82-91.
45	MAHBUB U， SARKAR S， CHELLAPPA R. Segment-based methods for facial attribute detection from partial faces ［J］. IEEE Transactions on Affective Computing， 2020， 11（4）： 601-613.
46	GILDENBLAT J. PyTorch library for CAM methods ［EB/OL］. ［2023-10-22］. .

任务	数据集	样本数		特征描述
任务	数据集	训练集	测试集	特征描述
FAE	CelebA	162 770	19 962	40种二元人脸属性类别
FER	RAF-DB	12 271	3 068	7种基础表情类别
FER	AffectNet	283 901	3 500	7种基础表情类别

任务	数据集	样本数		特征描述
任务	数据集	训练集	测试集	特征描述
FAE	CelebA	162 770	19 962	40种二元人脸属性类别
FER	RAF-DB	12 271	3 068	7种基础表情类别
FER	AffectNet	283 901	3 500	7种基础表情类别

模块	准确率
模块	AffectNet	RAF-DB
Baseline	64.77	90.74
Baseline+ CC Attention	65.03	90.91
Baseline+ Center Loss	66.66	91.75

模块	准确率
模块	AffectNet	RAF-DB
Baseline	64.77	90.74
Baseline+ CC Attention	65.03	90.91
Baseline+ Center Loss	66.66	91.75

损失函数	平均准确率
损失函数	Baseline	FAER
Asymmetric Loss	90.69	91.23
BCE Loss	91.03	91.71
Focal Loss	91.66	91.87

Facial attribute estimation and expression recognition based on contextual channel attention mechanism

基于上下文通道注意力机制的人脸属性估计与表情识别

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 46

Related Articles 15

Recommended Articles

Metrics

模型	准确率
模型	RAF-DB	AffectNet
DACL^［21］	87.87	65.20
VTFF^［26］	88.14	61.85
EfficientFace^［34］	88.36	63.70
MA-Net^［35］	88.42	64.53
PSR^［28］	88.98	63.77
AMP-Net^［37］	89.25	64.54
DAN^［36］	89.70	65.69
EAC^［22］	90.35	65.32
ARM^［38］	90.42	65.20
TransFER^［27］	90.91	66.23
Baseline	90.74	64.77
FAER	91.75	66.66

数据集	模型	类别准确率
数据集	模型	愤怒	厌恶	恐惧	开心	自然	悲伤	惊讶
RAF-DB	Baseline	88.20	72.80	75.39	95.74	90.86	87.65	89.97
RAF-DB	FAER	90.06	79.62	72.37	97.11	86.99	92.94	92.72
AffectNet	Baseline	61.39	66.18	67.23	77.20	53.05	71.07	59.26
AffectNet	FAER	62.25	72.86	70.22	80.97	56.57	66.33	59.93

[1]	Lifang WANG, Jingshuang WU, Pengliang YIN, Lihua HU. Action recognition algorithm based on attention mechanism and energy function [J]. Journal of Computer Applications, 2025, 45(1): 234-239.
[2]	Ying HUANG, Changsheng LI, Hui PENG, Su LIU. Dual-branch network guided by local entropy for dynamic scene high dynamic range imaging [J]. Journal of Computer Applications, 2025, 45(1): 204-213.
[3]	Jialin ZHANG, Qinghua REN, Qirong MAO. Speaker verification system utilizing global-local feature dependency for anti-spoofing [J]. Journal of Computer Applications, 2025, 45(1): 308-317.
[4]	Junying CHEN, Shijie GUO, Lingling CHEN. Lightweight human pose estimation based on decoupled attention and ghost convolution [J]. Journal of Computer Applications, 2025, 45(1): 223-233.
[5]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[6]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[7]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[8]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[9]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[10]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[11]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[12]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[13]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[14]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[15]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.