Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
余松森,杨荣彬,方义武
通讯作者:
基金资助:
Abstract: Facial Expression Recognition (FER) in the wild remains challenging because expression-relevant cues are often localized, while background textures, occlusion blocks, and illumination variations can severely corrupt feature statistics. In addition, noisy annotations and class imbalance further degrade robustness and blur decision boundaries for confusing categories. To improve robust facial expression recognition under complex real-world perturbations, this paper proposes QGSP-Former (Quality-Gated SPD-enhanced Pyramid Former), a quality-gated SPD-enhanced pyramid dual-stream Transformer for in-the-wild FER. Built upon a strong appearance–structure dual-stream backbone, the model learns multi-scale fused features via cross-stream interactions, where the structure stream provides relatively stable guidance when appearance cues become unreliable under occlusion and pose changes. An attention-guided second-order branch is introduced to construct weighted covariance representations that focus on expression-relevant regions and suppress background/occlusion interference. Since covariance matrices lie on the Symmetric Positive Definite (SPD) manifold, a geometry-consistent mapping is adopted to avoid metric distortion: bilinear mapping is used for dimensionality reduction and log-eigenvalue mapping for stable manifold-to-Euclidean transformation. The resulting SPD features are tokenized and fused with base tokens within the same Transformer encoder to enable unified sequence interaction. Moreover, a sample-quality-aware gating mechanism adaptively regulates the contribution of the second-order branch, alleviating negative transfer caused by unreliable high-order statistics on low-quality samples. Experiments on three 7-class benchmarks—RAF-DB, AffectNet, and FER2013—achieve accuracies of 92.73%, 67.69%, and 75.58%, outperforming POSTER++ by 0.52, 0.20, and 0.45 percentage points, respectively. These results demonstrate that geometry-consistent second-order representation and quality-gated fusion consistently improve robustness under real-world perturbations and information-limited imaging conditions.
Key words: facial expression recognition, second-order statistic, symmetric positive definite manifold, attention mechanism, quality gating
摘要: 在真实场景下的人脸表情识别中,表情相关信号往往仅分布于局部区域,背景纹理、遮挡块与光照变化会显著干扰特征统计;同时,在噪声标注与类别不均衡更突出的数据分布下,模型更易受到不可靠信息影响,导致类内离散增大与易混类别边界模糊。为提升复杂扰动条件下的人脸表情稳健识别能力,提出质量门控的对称正定(SPD)增强金字塔双流Transformer识别模型QGSP-Former:首先以外观流—结构流跨尺度交互学习多尺度融合特征,其中结构分支在遮挡、姿态变化导致外观线索不稳定时提供相对稳定引导;随后在二阶统计构建阶段引入注意力引导,使加权协方差聚焦表情相关区域并抑制遮挡与背景干扰;针对协方差处于对称正定流形的几何特性,采用双线性映射与对数谱映射获得几何一致的SPD表示,并将其token化后与基础token统一进入编码器进行序列交互融合;进一步引入样本质量感知门控,对高阶分支贡献进行自适应调节,降低不可靠统计带来的负迁移。实验在RAF-DB、AffectNet与FER2013三套7类基准上验证,准确率分别达到92.73%、67.69%与75.58%,相较POSTER++分别提升0.52、0.20与0.45个百分点。结果表明,几何一致的二阶统计表征与质量门控融合能够在真实场景扰动与“信息受限”的成像条件下持续挖掘更稳定的判别线索,从而提升整体鲁棒性与易混类别区分能力。
关键词: 人脸表情识别, 二阶统计, 对称正定流形, 注意力机制, 质量门控
CLC Number:
TP391.41
余松森 杨荣彬 方义武. 基于注意力引导对称正定二阶表示的人脸表情识别[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2026020192.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2026020192