《计算机应用》唯一官方网站 ›› 0, Vol. ›› Issue (): 44-49.DOI: 10.11772/j.issn.1001-9081.2024040529

• 人工智能 • 上一篇    下一篇

基于情感隐空间学习与CLIP模型的身体动作情感识别方法

罗红1, 慎煜杰2, 陈娟娟3, 王丹3()   

  1. 1.中移(杭州)信息技术有限公司,杭州 310023
    2.西安电子科技大学 杭州研究院,杭州 311231
    3.空天地一体化综合业务网全国重点实验室(西安电子科技大学),西安 710071
  • 收稿日期:2024-04-28 修回日期:2024-06-26 接受日期:2024-06-27 发布日期:2025-01-24 出版日期:2024-12-31
  • 通讯作者: 王丹
  • 作者简介:罗红(1977—),女,广西田林人,高级工程师,主要研究方向:AI交互、智能家居
    慎煜杰(1999—),男,浙江湖州人,硕士研究生,主要研究方向:情感计算、身体动作情感识别
    陈娟娟(1998—),女,甘肃定西人,硕士研究生,主要研究方向:多模态情感计算、情感驱动的多轮对话
    王丹(1992—),女,甘肃陇南人,副教授,博士,主要研究方向:机器学习、多智能体强化学习、知识建模与计算。

Body movement emotion recognition method based on emotional latent space learning and CLIP model

Hong LUO1, Yujie SHEN2, Juanjuan CHEN3, Dan WANG3()   

  1. 1.China Mobile (Hangzhou) Information Technology Company Limited,Hangzhou Zhejiang 310023,China
    2.Hangzhou Institute of Technology,Xidian University,Hangzhou Zhejiang 311231,China
    3.State Key Laboratory of Integrated Services Networks (Xidian University),Xi’an Shaanxi 710071,China
  • Received:2024-04-28 Revised:2024-06-26 Accepted:2024-06-27 Online:2025-01-24 Published:2024-12-31
  • Contact: Dan WANG

摘要:

身体动作情感识别的关键是提取人物身体动作蕴含的情感特征。针对现有模型情感特征学习能力较差且情感识别精度难以提升的问题,提出一种基于情感隐空间学习与对比语言-图像预训练(CLIP)模型的身体动作情感识别方法。首先,引入CLIP模型,从而增强模型的情感特征学习能力。其次,针对细粒度多标签情感分类任务,提出情感隐空间学习(ELSL)方法。该方法能通过学习情感隐空间向各个子空间的判别映射,在各个情感子空间上捕获情感类别之间的细微差异和对各情感类别的分类有益的特征信息。在面向真实世界开放场景的肢体语言数据集(BoLD)上的实验结果表明,所提方法充分利用了CLIP模型与隐空间学习在特征学习上的优势,取得了显著的性能提升。具体地,相较于运动分析网络(MANet),所提方法的平均精度均值(mAP)提高了1.08个百分点,平均受试者工作特征曲线下方面积(mRA)提高了1.32个百分点。

关键词: 身体动作情感识别, 对比语言-图像预训练模型, 隐空间学习, 提示学习, 多标签分类

Abstract:

The key to body movement emotion recognition lies in extracting emotional features existed in human body movements. To solve the problems of poor emotional feature learning capability and difficulty in improving emotion recognition accuracy in existing models, a body movement emotion recognition method based on Emotional Latent Space Learning (ELSL) and Contrastive Language-Image Pre-training (CLIP) model was proposed. Firstly, CLIP model was introduced to improve the emotional feature learning capability of the model. Secondly, for the fine-grained multi-label emotion classification task, ELSL method was proposed. By learning discriminative mappings from emotional latent space to various subspaces, the subtle differences between emotion categories and the feature information beneficial to the classification of each emotion category in various emotional subspaces. Experiments were carried out on real-world open scenarios-oriented Body Language Dataset (BoLD) The results demonstrate that the proposed method makes use of the advantages of CLIP model and latent space learning in feature learning effectively, leading to significant performance improvement. In specific, compared to Movement Analysis Network (MANet), the proposed method has a 1.08 percentage points increase in mean Average Precision (mAP) and a 1.32 percentage points improvement in mean Area Under Receiver Operating Characteristic Curve (mRA).

Key words: body movement emotion recognition, Contrastive Language-Image Pre-training (CLIP) model, latent space learning, prompt learning, multi-label classification

中图分类号: