Dual-channel Multimodal Sentiment Analysis Model Based on Contrast Invariance And Enhance Specificity

doi:10.11772/j.issn.1001-9081.2025060731

Journal of Computer Applications

Received:2025-07-02 Revised:2025-08-25 Online:2025-09-05 Published:2025-09-05

基于对比不变性和强化特定性的双通道多模态情感分析模型

何运平,王雷春,宋芮芮,卢祥凤,魏金香,刘小萌

湖北大学-计算机学院

通讯作者: 王雷春
基金资助:
时空数据环境下基于深度级联生成对抗网络的建筑能耗预测方法研究;知识图谱引导的内容可控的文本生成式隐写方法研究

Abstract

Abstract: In view of the problem that existing multimodal sentiment analysis methods often lead to inaccurate sentiment analysis results due to modal heterogeneity and insufficient internal interaction, a dual-channel multimodal sentiment analysis model based on Contrast Invariance and Reinforcement Specificity (CIRS) was proposed. First, the features in text, video and audio data were extracted and dimensionally aligned. Then, the invariant features of the modals were compared in consistency, and the mutual learning of invariant features between modals was enhanced through homogeneous graph distillation to improve the characterization consistency of modals. Thirdly, the specific features of modals were strengthened, and the knowledge transfer of specific features between modals was used to achieve semantic spatial alignment between modals. Finally, the invariant features and specific features were deeply integrated and predicted through self-attention mechanism and cross-modal attention mechanism. Experimental results on multimodal data sets show that compared with DLF(Disentangled-Language-Focused), the Mean Absolute Error (MAE) of CIRS on CMU-MOSI (Carnegie Mellon University Multimodal Opinion Sentiment Intensity) was reduced by 4.11%, 2-Class Accuracy (Acc-2) and F1 score (F1-score) were improved by 1.29% both; the MAE on CMU-MOSEI (Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity) was reduced by 1.85% , and the Acc-2 and F1-score were improved by 0.70% and 0.94% respectively. The experimental results verify that CIRS can effectively reduce errors and improve classification accuracy during multimodal sentiment analysis.

Key words: multimodal sentiment analysis, modal heterogeneity, semantic space alignment, self attention mechanism, cross modal attention mechanism

摘要： 针对现有多模态情感分析方法常因模态异质性及内部交互不足导致情感分析结果不准确的问题，提出了一种基于对比不变性和强化特定性的双通道多模态情感分析模型(CIRS)。首先，对文本、视频和音频数据中的特征进行提取和维度对齐；其次，对模态的不变特征进行一致性对比，通过同质图蒸馏增强模态间不变特征的相互学习，提高模态的表征一致性；再次，对模态的特定特征进行强化，使用异质图蒸馏对模态间的特定特征进行知识迁移，实现模态间的语义空间对齐；最后通过自注意力机制和跨模态注意力机制对不变特征和特定特征进行深度融合和预测。在多模态数据集上的实验结果表明，与DLF(Disentangled-Language-Focused)相比，CIRS在CMU-MOSI(Carnegie Mellon University Multimodal Opinion Sentiment Intensity)上的平均绝对误差(MAE)降低了4.11%，二分类准确率(Acc-2)和F1分数(F1-score)均提高了1.29%；在CMU-MOSEI(Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity)上的MAE降低了1.85%，Acc-2和F1-score分别提高了0.70%和0.94%。实验结果验证了CIRS在多模态情感分析时能够有效降低误差和提高分类的准确率。

关键词: 多模态情感分析, 模态异质性, 语义空间对齐, 自注意力机制, 跨模态注意力机制

何运平王雷春宋芮芮卢祥凤魏金香刘小萌. 基于对比不变性和强化特定性的双通道多模态情感分析模型[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025060731.

[1]	Hu LUO, Mingshu ZHANG. Rumor detection method based on cross-modal attention mechanism and contrastive learning [J]. Journal of Computer Applications, 2026, 46(2): 361-367.
[2]	Yilin DENG, Fajiang YU. Pseudo random number generator based on LSTM and separable self-attention mechanism [J]. Journal of Computer Applications, 2025, 45(9): 2893-2901.
[3]	Xiang WANG, Zhixiang CHEN, Guojun MAO. Multivariate time series prediction method combining local and global correlation [J]. Journal of Computer Applications, 2025, 45(9): 2806-2816.
[4]	Yihan WANG, Chong LU, Zhongyuan CHEN. Multimodal sentiment analysis model with cross-modal text information enhancement [J]. Journal of Computer Applications, 2025, 45(7): 2237-2244.
[5]	Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU. Source code vulnerability detection method based on Transformer-GCN [J]. Journal of Computer Applications, 2025, 45(7): 2296-2303.
[6]	Xiaoqiang ZHAO, Yongyong LIU, Yongyong HUI, Kai LIU. Batch process quality prediction model using improved time-domain convolutional network with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2025, 45(7): 2245-2252.
[7]	Hui LI, Bingzhi JIA, Chenxi WANG, Ziyu DONG, Jilong LI, Zhaoman ZHONG, Yanyan CHEN. Generative adversarial network underwater image enhancement model based on Swin Transformer [J]. Journal of Computer Applications, 2025, 45(5): 1439-1446.
[8]	Kunyuan JIANG, Xiaoxia LI, Li WANG, Yaodan CAO, Xiaoqiang ZHANG, Nan DING, Yingyue ZHOU. Boundary-cross supervised semantic segmentation network with decoupled residual self-attention [J]. Journal of Computer Applications, 2025, 45(4): 1120-1129.
[9]	Pengcheng SONG, Lijun GUO, Rong ZHANG. Weakly supervised video anomaly detection with local-global temporal dependency [J]. Journal of Computer Applications, 2025, 45(1): 240-246.
[10]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[11]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[12]	Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831.
[13]	Yue LIU, Fang LIU, Aoyun WU, Qiuyue CHAI, Tianxiao WANG. 3D object detection network based on self-attention mechanism and graph convolution [J]. Journal of Computer Applications, 2024, 44(6): 1972-1977.
[14]	Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer [J]. Journal of Computer Applications, 2024, 44(4): 1269-1276.
[15]	Xinran LUO, Tianrui LI, Zhen JIA. Chinese medical named entity recognition based on self-attention mechanism and lexicon enhancement [J]. Journal of Computer Applications, 2024, 44(2): 385-392.