Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (10): 2835-2841.DOI: 10.11772/j.issn.1001-9081.2020101676

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

Visual-textual sentiment analysis method based on multi-level spatial attention

GUO Kexin, ZHANG Yuxiang   

  1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
  • Received:2020-10-29 Revised:2021-01-19 Online:2021-10-10 Published:2021-01-27
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (U1533104).


郭可心, 张宇翔   

  1. 中国民航大学 计算机科学与技术学院, 天津 300300
  • 通讯作者: 张宇翔
  • 作者简介:郭可心(1995-),女,天津人,硕士研究生,主要研究方向:多模态情感分析、机器学习;张宇翔(1975-),男,山西五寨人,教授,博士,主要研究方向:数据分析与处理、民航信息化。
  • 基金资助:

Abstract: With the continuous popularization and promotion of social networks, compared with traditional text description, people are inclined to post reviews with both images and texts to express their feelings and opinions. The existing visual-textual sentiment analysis methods only consider the high-level semantic relation between images and texts, but pay less attention to the correlation between the low-level visual features and middle-level aesthetic features of images and the sentiment of texts. Thus, a visual-textual sentiment analysis method based on Multi-Level Spatial Attention (MLSA) was proposed. In the proposed method, driven by text content, MLSA was used to design the feature fusion method between images and texts. This feature fusion method not only focused on the image entity features related to texts, but also made full use of the middle-level aesthetic features and low-level visual features of images, so as to to mine the sentiment co-occurrence between images and texts from various perspectives. Compared to the classification effect of the best method among the comparison methods, the classification effect of the model was improved by 0.96 and 1.06 percentage points on accuracy, and improved by 0.96 and 0.62 percentage points on F1 score on two public multimodal sentiment datasets (MVSA_Single and MVSA_Multi) respectively. Experimental results show that the comprehensive analysis of the hierarchical relationship between text features and image features can effectively enhance the neural network's ability to capture the emotional semantics of texts and images, so as to predict the overall sentiment of texts and images more accurately.

Key words: spatial attention, multi-feature fusion, sentiment analysis, multimodal, social media, neural network

摘要: 随着社交网络的不断普及,相对于传统的文字描述,人们更倾向于发布图文结合的评论来表达自己的情感与意见。针对图文情感分析方法中仅考虑图文间的高级语义联系,而较少注意图片的低层次情感特征以及中层美学特征与文本情感之间关联性的问题,提出了一种基于多层次空间注意力(MLSA)的图文评论情感分析方法。所提方法以文本内容为驱动,使用MLSA设计图像与文本之间的特征融合方法,该特征融合方法不仅关注与文本相关的图像实体特征,而且充分利用图像的中层美学特征和低层视觉特征,从而从多个不同角度挖掘图文之间的情感共现。在两个公开的图文情感数据集MVSA_Single和MVSA_Multi上,该方法的分类效果相对于对比方法中最优的方法的分类效果在准确率上分别提高了0.96和1.06个百分点,在F1值上分别提高了0.96和0.62个百分点。实验结果表明,综合分析文本特征和图像特征之间的层次化联系能有效地增强神经网络捕捉图文情感语义的能力,从而更准确地预测图文整体的情感。

关键词: 空间注意力, 多特征融合, 情感分析, 多模态, 社交媒体, 神经网络

CLC Number: