《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (1): 79-85.DOI: 10.11772/j.issn.1001-9081.2023060815

• 跨媒体表征学习与认知推理 • 上一篇    下一篇

用于未对齐多模态语言序列情感分析的多交互感知网络

罗俊豪1, 朱焱2()   

  1. 1.西南交通大学 计算机与人工智能学院,成都 611756
    2.西南交通大学 利兹学院,成都 611756
  • 收稿日期:2023-06-26 修回日期:2023-09-12 接受日期:2023-09-13 发布日期:2023-09-20 出版日期:2024-01-10
  • 通讯作者: 朱焱
  • 作者简介:罗俊豪(1999—),男,四川成都人,硕士研究生,主要研究方向:多模态数据挖掘、情感分析;
    第一联系人:朱焱(1965—),女,广西桂林人,教授,博士,CCF会员,主要研究方向:数据挖掘、Web异常模式发现、大数据管理与智能分析。
  • 基金资助:
    四川省科技计划项目(2019YFSY0032)

Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis

Junhao LUO1, Yan ZHU2()   

  1. 1.School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu Sichuan 611756,China
    2.Leeds Joint School,Southwest Jiaotong University,Chengdu Sichuan 611756,China
  • Received:2023-06-26 Revised:2023-09-12 Accepted:2023-09-13 Online:2023-09-20 Published:2024-01-10
  • Contact: Yan ZHU
  • About author:LUO Junhao, born in 1999, M. S. candidate. His research interests include multimodal data mining, sentiment analysis.
  • Supported by:
    Science and Technology Plan of Sichuan Province(2019YFSY0032)

摘要:

针对现有对齐多模态语言序列情感分析方法常用的单词对齐方法缺乏可解释性的问题,提出了一种用于未对齐多模态语言序列情感分析的多交互感知网络(MultiDAN)。MultiDAN的核心是多层的、多角度的交互信息提取。首先使用循环神经网络(RNN)和注意力机制捕捉模态内的交互信息;然后,使用图注意力网络(GAT)一次性提取模态内及模态间的、长短期的交互信息;最后,使用特殊的图读出方法,再次提取图中节点的模态内及模态间交互信息,得到多模态语言序列的唯一表征,并应用多层感知机(MLP)分类获得序列的情感分数。在两个常用公开数据集CMU-MOSI和CMU-MOSEI上的实验结果表明,MultiDAN能充分提取交互信息,在未对齐的两个数据集上MultiDAN的F1值比对比方法中最优的模态时空注意图(MTAG)分别提高了0.49个和0.72个百分点,具有较高的稳定性。MultiDAN可以提高多模态语言序列的情感分析性能,且图神经网络(GNN)能有效提取模态内、模态间的交互信息。

关键词: 情感分析, 多模态语言序列, 多模态融合, 图神经网络, 注意力机制

Abstract:

Considering the issue that the word alignment methods commonly used in the existing methods for aligned multimodal language sequence sentiment analysis lack interpretability, a Multi-Dynamic Aware Network (MultiDAN) for unaligned multimodal language sequence sentiment analysis was proposed. The core of MultiDAN was multi-layer and multi-angle extraction of dynamics. Firstly, Recurrent Neural Network (RNN) and attention mechanism were used to capture the dynamics within the modalities; secondly, intra- and inter-modal, long- and short-term dynamics were extracted at once using Graph Attention neTwork (GAT); finally, the intra- and inter-modal dynamics of the nodes in the graph were extracted again using a special graph readout method to obtain a unique representation of the multimodal language sequence, and the sentiment score of the sequence was obtained by applying a MultiLayer Perceptron (MLP) classification. The experimental results on two commonly used publicly available datasets, CMU-MOSI and CMU-MOSEI, show that MultiDAN can fully extract the dynamics, and the F1 values of MultiDAN on the two unaligned datasets improve by 0.49 and 0.72 percentage points respectively, compared to the optimal Modal-Temporal Attention Graph (MTAG) in the comparison methods, which have high stability. MultiDAN can improve the performance of sentiment analysis for multimodal language sequences, and the Graph Neural Network (GNN) can effectively extract intra- and inter-modal dynamics.

Key words: sentiment analysis, multimodal language sequence, multimodal fusion, graph neural network, attention mechanism

中图分类号: