Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (6): 1767-1775.DOI: 10.11772/j.issn.1001-9081.2025060731

• Artificial intelligence • Previous Articles    

Dual-channel multimodal sentiment analysis model based on contrast invariance and reinforcement specificity

Yunping HE, Leichun WANG(), Ruirui SONG, Xiangfeng LU, Jinxiang WEI, Xiaomeng LIU   

  1. School of Computer Science,Hubei University,Wuhan Hubei 430062,China
  • Received:2025-07-02 Revised:2025-08-25 Accepted:2025-08-28 Online:2025-09-05 Published:2026-06-10
  • Contact: Leichun WANG
  • About author:HE Yunping, born in 2000, M. S. candidate. His research interests include multimodal sentiment analysis, long time series prediction.
    SONG Ruirui, born in 1999, M. S. candidate. Her research interests include long time series prediction, multimodal data analysis.
    LU Xiangfeng, born in 2000, M. S. candidate. Her research interests include multimodal data analysis, fake news detection.
    WEI Jinxiang, born in 2000, M. S. candidate. Her research interests include deep learning, spatio-temporal data prediction.
    LIU Xiaomeng, born in 2001, M. S. candidate. Her research interests include deep learning, multimodal data analysis.
    First author contact:WANG Leichun, born in 1974, Ph. D., associate professor. His research interests include deep learning, big data analysis.
  • Supported by:
    National Natural Science Foundation of China(62106069);National Social Science Foundation of China(24BTQ019)

基于对比不变性和强化特定性的双通道多模态情感分析模型

何运平, 王雷春(), 宋芮芮, 卢祥凤, 魏金香, 刘小萌   

  1. 湖北大学 计算机学院,武汉 430062
  • 通讯作者: 王雷春
  • 作者简介:何运平(2000—),男,湖北荆州人,硕士研究生,CCF会员,主要研究方向:多模态情感分析、长时间序列预测
    宋芮芮(1999—),女,山东枣庄人,硕士研究生,主要研究方向:长时间序列预测、多模态数据分析
    卢祥凤(2000—),女,山东临沂人,硕士研究生,主要研究方向:多模态数据分析、假新闻检测
    魏金香(2000—),女,安徽阜阳人,硕士研究生,主要研究方向:深度学习、时空数据预测
    刘小萌(2001—),女,山东枣庄人,硕士研究生,主要研究方向:深度学习、多模态数据分析。
    第一联系人:王雷春(1974—),男,湖北武汉人,副教授,博士,主要研究方向:深度学习、大数据分析
  • 基金资助:
    国家自然科学基金资助项目(62106069);国家自然科学基金资助项目(62102136);国家社会科学基金资助项目(24BTQ019)

Abstract:

In view of the problem that the existing Multimodal Sentiment Analysis (MSA) methods often lead to inaccurate sentiment analysis results due to modal heterogeneity and insufficient internal interaction, a dual-channel MSA model based on Contrast Invariance and Reinforcement Specificity (CIRS) was proposed. Firstly, the features in text, video and audio data were extracted and dimensionally aligned. Secondly, the invariant features of the modals were compared in consistency, and the mutual learning of invariant features between modals was enhanced through homogeneous graph distillation, so as to improve the representation consistency of modals. Thirdly, the specific features of modals were strengthened, and the knowledge transfer of specific features between modals was performed, so as to achieve semantic spatial alignment between modals. Finally, the invariant features and specific features were deeply integrated and predicted through self-attention mechanism and cross-modal attention mechanism. Experimental results show that compared with DLF (Disentangled-Language-Focused multimodal sentiment analysis), CIRS has the Mean Absolute Error (MAE) reduced by 4.11%, 2-Class Accuracy (Acc-2) and F1-score both improved by 1.29% on the CMU-MOSI (Carnegie Mellon University Multimodal Opinion Sentiment Intensity) dataset; CIRS has the MAE reduced by 1.85% , and the Acc-2 and F1-score improved by 0.70% and 0.94%, respectively, on the CMU-MOSEI (Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity) dataset. The above verifies that CIRS can reduce errors and improve classification accuracy during multimodal sentiment analysis effectively.

Key words: Multimodal Sentiment Analysis (MSA), modal heterogeneity, semantic spatial alignment, self-attention mechanism, cross-modal attention mechanism

摘要:

针对现有多模态情感分析(MSA)方法常因模态异质性及内部交互不足导致情感分析结果不准确的问题,提出一种基于对比不变性和强化特定性的双通道MSA模型(CIRS)。首先,提取文本、视频和音频数据中的特征并对齐维度;其次,对模态的不变特征进行一致性对比,通过同质图蒸馏增强模态间不变特征的相互学习,提高模态的表征一致性;再次,强化模态的特定特征,使用异质图蒸馏对模态间的特定特征进行知识迁移,实现模态间的语义空间对齐;最后,通过自注意力机制和跨模态注意力机制对不变特征和特定特征进行深度融合与预测。实验结果表明,与DLF(Disentangled-Language-Focused multimodal sentiment analysis)相比,CIRS在CMU-MOSI (Carnegie Mellon University Multimodal Opinion Sentiment Intensity)数据集上的平均绝对误差(MAE)降低了4.11%,二分类准确率(Acc-2)和F1分数均提高了1.29%;在CMU-MOSEI (Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity)数据集上的MAE降低了1.85%,Acc-2和F1分数分别提高了0.70%和0.94%。以上验证了CIRS在进行多模态情感分析时能够有效降低误差和提高分类的准确率。

关键词: 多模态情感分析, 模态异质性, 语义空间对齐, 自注意力机制, 跨模态注意力机制

CLC Number: