Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Multimodal sentiment analysis network with self-supervision and multi-layer cross attention
Kaipeng XUE, Tao XU, Chunjie LIAO
Journal of Computer Applications    2024, 44 (8): 2387-2392.   DOI: 10.11772/j.issn.1001-9081.2023081209
Abstract44)   HTML7)    PDF (1572KB)(633)       Save

Aiming at the problems of incomplete intra-modal information, poor inter-modal interaction, and difficulty in training in multimodal sentiment analysis, a Multimodal Sentiment analysis network with Self-supervision and Multi-layer cross Attention fusion (MSSM) was proposed with Visual-and-Language Pre-training (VLP) model applied to the field of multimodal sentiment analysis. The visual encoder module was enhanced through self-supervised learning, and multi-layer cross attention was added to better model textual and visual features. Thus, the intra-modal information was made more abundant and complete, and the inter-modal information interaction was made more sufficient. Besides, the fast and memory-efficient exact attention with IO-awareness: FlashAttention was adopted in the proposed algorithm to address the high complexity of attention computation in Transformer. Experimental results show that compared with the current mainstream model Contrastive Language-Image Pre-training (CLIP), MSSM improves the accuracy by 3.6 percentage points on the processed MVSA-S dataset and 2.2 percentage points on MVSA-M dataset, proving that the proposed network can effectively improve the integrity of multimodal information fusion while reducing computational cost.

Table and Figures | Reference | Related Articles | Metrics