Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
何运平,王雷春,宋芮芮,卢祥凤,魏金香,刘小萌
通讯作者:
基金资助:
Abstract: In view of the problem that existing multimodal sentiment analysis methods often lead to inaccurate sentiment analysis results due to modal heterogeneity and insufficient internal interaction, a dual-channel multimodal sentiment analysis model based on Contrast Invariance and Reinforcement Specificity (CIRS) was proposed. First, the features in text, video and audio data were extracted and dimensionally aligned. Then, the invariant features of the modals were compared in consistency, and the mutual learning of invariant features between modals was enhanced through homogeneous graph distillation to improve the characterization consistency of modals. Thirdly, the specific features of modals were strengthened, and the knowledge transfer of specific features between modals was used to achieve semantic spatial alignment between modals. Finally, the invariant features and specific features were deeply integrated and predicted through self-attention mechanism and cross-modal attention mechanism. Experimental results on multimodal data sets show that compared with DLF(Disentangled-Language-Focused), the Mean Absolute Error (MAE) of CIRS on CMU-MOSI (Carnegie Mellon University Multimodal Opinion Sentiment Intensity) was reduced by 4.11%, 2-Class Accuracy (Acc-2) and F1 score (F1-score) were improved by 1.29% both; the MAE on CMU-MOSEI (Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity) was reduced by 1.85% , and the Acc-2 and F1-score were improved by 0.70% and 0.94% respectively. The experimental results verify that CIRS can effectively reduce errors and improve classification accuracy during multimodal sentiment analysis.
Key words: multimodal sentiment analysis, modal heterogeneity, semantic space alignment, self attention mechanism, cross modal attention mechanism
摘要: 针对现有多模态情感分析方法常因模态异质性及内部交互不足导致情感分析结果不准确的问题,提出了一种基于对比不变性和强化特定性的双通道多模态情感分析模型(CIRS)。首先,对文本、视频和音频数据中的特征进行提取和维度对齐;其次,对模态的不变特征进行一致性对比,通过同质图蒸馏增强模态间不变特征的相互学习,提高模态的表征一致性;再次,对模态的特定特征进行强化,使用异质图蒸馏对模态间的特定特征进行知识迁移,实现模态间的语义空间对齐;最后通过自注意力机制和跨模态注意力机制对不变特征和特定特征进行深度融合和预测。在多模态数据集上的实验结果表明,与DLF(Disentangled-Language-Focused)相比,CIRS在CMU-MOSI(Carnegie Mellon University Multimodal Opinion Sentiment Intensity)上的平均绝对误差(MAE)降低了4.11%,二分类准确率(Acc-2)和F1分数(F1-score)均提高了1.29%;在CMU-MOSEI(Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity)上的MAE降低了1.85%,Acc-2和F1-score分别提高了0.70%和0.94%。实验结果验证了CIRS在多模态情感分析时能够有效降低误差和提高分类的准确率。
关键词: 多模态情感分析, 模态异质性, 语义空间对齐, 自注意力机制, 跨模态注意力机制
何运平 王雷春 宋芮芮 卢祥凤 魏金香 刘小萌. 基于对比不变性和强化特定性的双通道多模态情感分析模型[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025060731.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025060731