Attention-guided symmetric positive definite second-order representation for facial expression recognition

doi:10.11772/j.issn.1001-9081.2026020192

Abstract

Abstract: Facial Expression Recognition (FER) in the wild remains challenging because expression-relevant cues are often localized, while background textures, occlusion blocks, and illumination variations can severely corrupt feature statistics. In addition, noisy annotations and class imbalance further degrade robustness and blur decision boundaries for confusing categories. To improve robust facial expression recognition under complex real-world perturbations, this paper proposes QGSP-Former (Quality-Gated SPD-enhanced Pyramid Former), a quality-gated SPD-enhanced pyramid dual-stream Transformer for in-the-wild FER. Built upon a strong appearance–structure dual-stream backbone, the model learns multi-scale fused features via cross-stream interactions, where the structure stream provides relatively stable guidance when appearance cues become unreliable under occlusion and pose changes. An attention-guided second-order branch is introduced to construct weighted covariance representations that focus on expression-relevant regions and suppress background/occlusion interference. Since covariance matrices lie on the Symmetric Positive Definite (SPD) manifold, a geometry-consistent mapping is adopted to avoid metric distortion: bilinear mapping is used for dimensionality reduction and log-eigenvalue mapping for stable manifold-to-Euclidean transformation. The resulting SPD features are tokenized and fused with base tokens within the same Transformer encoder to enable unified sequence interaction. Moreover, a sample-quality-aware gating mechanism adaptively regulates the contribution of the second-order branch, alleviating negative transfer caused by unreliable high-order statistics on low-quality samples. Experiments on three 7-class benchmarks—RAF-DB, AffectNet, and FER2013—achieve accuracies of 92.73%, 67.69%, and 75.58%, outperforming POSTER++ by 0.52, 0.20, and 0.45 percentage points, respectively. These results demonstrate that geometry-consistent second-order representation and quality-gated fusion consistently improve robustness under real-world perturbations and information-limited imaging conditions.

Key words: facial expression recognition, second-order statistic, symmetric positive definite manifold, attention mechanism, quality gating

摘要： 在真实场景下的人脸表情识别中，表情相关信号往往仅分布于局部区域，背景纹理、遮挡块与光照变化会显著干扰特征统计；同时，在噪声标注与类别不均衡更突出的数据分布下，模型更易受到不可靠信息影响，导致类内离散增大与易混类别边界模糊。为提升复杂扰动条件下的人脸表情稳健识别能力，提出质量门控的对称正定（SPD）增强金字塔双流Transformer识别模型QGSP-Former：首先以外观流—结构流跨尺度交互学习多尺度融合特征，其中结构分支在遮挡、姿态变化导致外观线索不稳定时提供相对稳定引导；随后在二阶统计构建阶段引入注意力引导，使加权协方差聚焦表情相关区域并抑制遮挡与背景干扰；针对协方差处于对称正定流形的几何特性，采用双线性映射与对数谱映射获得几何一致的SPD表示，并将其token化后与基础token统一进入编码器进行序列交互融合；进一步引入样本质量感知门控，对高阶分支贡献进行自适应调节，降低不可靠统计带来的负迁移。实验在RAF-DB、AffectNet与FER2013三套7类基准上验证，准确率分别达到92.73%、67.69%与75.58%，相较POSTER++分别提升0.52、0.20与0.45个百分点。结果表明，几何一致的二阶统计表征与质量门控融合能够在真实场景扰动与“信息受限”的成像条件下持续挖掘更稳定的判别线索，从而提升整体鲁棒性与易混类别区分能力。

关键词: 人脸表情识别, 二阶统计, 对称正定流形, 注意力机制, 质量门控

CLC Number:

TP391.41

余松森杨荣彬方义武. 基于注意力引导对称正定二阶表示的人脸表情识别[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2026020192.

[1]	Huijie GUO, Tianfeng DOU, Zhenlin ZHANG, Kaiyuan QI, Dong WU, Zhijian QU, Zhao LI, Chongguang REN. Time-interdependency-aware dynamic Bayesian network for traffic prediction [J]. Journal of Computer Applications, 2026, 46(5): 1507-1517.
[2]	Wen PENG, Bokai ZHANG, Jinwei LIN. Chromosome cascaded classification framework integrating image texture enhancement and super-resolution [J]. Journal of Computer Applications, 2026, 46(5): 1647-1657.
[3]	Qianfei WANG, Yang LI, Deyu LI, Suge WANG. Dual-channel feature fusion representation method for short-text clustering based on large language model [J]. Journal of Computer Applications, 2026, 46(5): 1441-1449.
[4]	Jing HU, Shikun CHEN, Fang WANG, Rui ZHANG, Yong WANG. Ore image segmentation with linear deformable convolution and dual-domain synergistic dynamic attention [J]. Journal of Computer Applications, 2026, 46(5): 1692-1702.
[5]	Ying JING, Ran LI, Zhuo JIANG, Ziyang FU, Jingyi DU, Qi LIU, Jihang LIU. SAM Meibomian gland unified dense segmentation method with introduction of automatic prompt encoder [J]. Journal of Computer Applications, 2026, 46(5): 1667-1676.
[6]	Baoyuan ZHENG, Chaobo HE. Graph convolutional network enhanced by graph diffusion and dual-view feature learning [J]. Journal of Computer Applications, 2026, 46(5): 1370-1377.
[7]	Ruirui SONG, Leichun WANG, Yunping HE, Jinxiang WEI, Xiangfeng LU, Xiaomeng LIU. Long time series prediction based on hybrid self-attention and differentiated normalization [J]. Journal of Computer Applications, 2026, 46(5): 1499-1506.
[8]	Hongrui ZHANG, Weiming FENG, Luxia YANG, Yongjie MA. CSAF-YOLO： improved YOLO11 algorithm for underwater small object detection [J]. Journal of Computer Applications, 2026, 46(5): 1578-1585.
[9]	Xumeng DOU, Bin XIE, Zhaohui ZHANG, Zhengang ZHAO, Hanyu DUAN, Aolei GUO. Drug-target interaction prediction based on structure-network collaborative features and grid-attention enhanced Kolmogorov-Arnold network [J]. Journal of Computer Applications, 2026, 46(4): 1344-1353.
[10]	Huanxian LIU, Hongtao WANG, Xian’ao WANG, Hongmei WANG, Weifeng XU. Multimodal fact verification with cross-modal semantic association [J]. Journal of Computer Applications, 2026, 46(4): 1069-1076.
[11]	Chuandong QIN, Zhiqiang SUO. Skin cancer classification integrating improved ResNet50 with ensemble classifier [J]. Journal of Computer Applications, 2026, 46(4): 1354-1362.
[12]	Xiang BAI, Juchuan LI, Huimin WANG, Chao JING, Jian NIU, Xingzhong ZHANG, Yongqiang CHENG. Power image retrieval method based on improved Swin Transformer [J]. Journal of Computer Applications, 2026, 46(4): 1334-1343.
[13]	Peirong SHAO, Suzhen LIN, Yanbo WANG. Human-centric detail-enhanced virtual try-on method [J]. Journal of Computer Applications, 2026, 46(3): 915-923.
[14]	Zuxi ZHANG, Zhancheng ZHANG, Fuyuan HU. Local and long-range temporal complementary modeling for video action recognition [J]. Journal of Computer Applications, 2026, 46(3): 758-766.
[15]	Hu LUO, Mingshu ZHANG. Rumor detection method based on cross-modal attention mechanism and contrastive learning [J]. Journal of Computer Applications, 2026, 46(2): 361-367.

Attention-guided symmetric positive definite second-order representation for facial expression recognition

基于注意力引导对称正定二阶表示的人脸表情识别

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics