Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
薛凯鹏,廖春节,徐涛
通讯作者:
基金资助:
Abstract: Aiming at the problems of incomplete intra-modal information, poor inter-modal interaction and difficult training in multimodal sentiment analysis tasks, a visual language pre-training model (VLP) is applied to the field of multimodal sentiment analysis, and a multimodal sentiment analysis network (Multimodal EmotionNet Fused Self- Supervised Learning and Multi-Layer Cross, MESM). The visual coder module is strengthened by self-supervised learning and multi-layer cross attention is added to better model textual and visual features, which makes the intra-modal information richer and more complete and the inter-modal information interaction more adequate, and solves the problem of high complexity of attention computation in the Transformer by the fast, memory-efficient Flash Attention with IO-awareness. high complexity problem in Transformer. Compared with the current mainstream TomBERT, CLIP, VILT, and ViLBERT, MESM has the best results in accuracy and recall on the processed MVSA dataset, reaching 71.3% and 69.2%, respectively, which proves that the method can simultaneously and efficiently improve the completeness of the multimodal information fusion while reducing the cost of computation.
Key words: Keywords: multimodal, sentiment analysis, self-supervision, attentional mechanism, visual language pretraining model
摘要: 针对多模态情感分析任务中模态内信息不完整、模态间交互能力差和难以训练的问题,将视觉语言预训练模型(VLP)应用于多模态情感分析领域,提出了一种融合自监督和多层交叉注意力的多模态情感分析网络(Multimodal EmotionNet Fused Self-Supervised Learning and Multi-Layer Cross,MESM)。通过自监督学习强化视觉编码器模块并加入多层交叉注意力以更好地建模文本和视觉特征,使得模态内部信息更加丰富完整,同时使模态间的信息交互更加充分,并通过具有IO意识的快速、内存效率高的精确注意力(Flash Attention)解决Transformer中注意力计算高复杂度的问题。与目前主流的TomBERT、CLIP、VILT、ViLBERT相比,MESM在处理后的MVSA数据集上准确率和召回率达到最高,分别为71.3%和69.2%,证明该方法能在降低运算成本的前提下同时有效提高多模态信息融合的完整性。
关键词: 关键词: 多模态, 情感分析, 自监督, 注意力机制, 视觉语言预训练模型
CLC Number:
TP391.41
薛凯鹏 廖春节 徐涛. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2023081209.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023081209