用于未对齐多模态语言序列情感分析的多交互感知网络

doi:10.11772/j.issn.1001-9081.2023060815

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (1): 79-85.DOI: 10.11772/j.issn.1001-9081.2023060815

• 跨媒体表征学习与认知推理 • 上一篇下一篇

用于未对齐多模态语言序列情感分析的多交互感知网络

罗俊豪¹, 朱焱²()

^1.西南交通大学计算机与人工智能学院，成都 611756
^2.西南交通大学利兹学院，成都 611756

收稿日期:2023-06-26 修回日期:2023-09-12 接受日期:2023-09-13 发布日期:2023-09-20 出版日期:2024-01-10
通讯作者: 朱焱
作者简介:罗俊豪（1999—），男，四川成都人，硕士研究生，主要研究方向：多模态数据挖掘、情感分析；
第一联系人：朱焱（1965—），女，广西桂林人，教授，博士，CCF会员，主要研究方向：数据挖掘、Web异常模式发现、大数据管理与智能分析。
基金资助:
四川省科技计划项目(2019YFSY0032)

Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis

Junhao LUO¹, Yan ZHU²()

^1.School of Computing and Artificial Intelligence，Southwest Jiaotong University，Chengdu Sichuan 611756，China
^2.Leeds Joint School，Southwest Jiaotong University，Chengdu Sichuan 611756，China

Received:2023-06-26 Revised:2023-09-12 Accepted:2023-09-13 Online:2023-09-20 Published:2024-01-10
Contact: Yan ZHU
About author:LUO Junhao， born in 1999， M. S. candidate. His research interests include multimodal data mining， sentiment analysis.
Supported by:
Science and Technology Plan of Sichuan Province(2019YFSY0032)

摘要/Abstract

摘要：

针对现有对齐多模态语言序列情感分析方法常用的单词对齐方法缺乏可解释性的问题，提出了一种用于未对齐多模态语言序列情感分析的多交互感知网络（MultiDAN）。MultiDAN的核心是多层的、多角度的交互信息提取。首先使用循环神经网络（RNN）和注意力机制捕捉模态内的交互信息；然后，使用图注意力网络（GAT）一次性提取模态内及模态间的、长短期的交互信息；最后，使用特殊的图读出方法，再次提取图中节点的模态内及模态间交互信息，得到多模态语言序列的唯一表征，并应用多层感知机（MLP）分类获得序列的情感分数。在两个常用公开数据集CMU-MOSI和CMU-MOSEI上的实验结果表明，MultiDAN能充分提取交互信息，在未对齐的两个数据集上MultiDAN的F1值比对比方法中最优的模态时空注意图（MTAG）分别提高了0.49个和0.72个百分点，具有较高的稳定性。MultiDAN可以提高多模态语言序列的情感分析性能，且图神经网络（GNN）能有效提取模态内、模态间的交互信息。

关键词: 情感分析, 多模态语言序列, 多模态融合, 图神经网络, 注意力机制

Abstract:

Considering the issue that the word alignment methods commonly used in the existing methods for aligned multimodal language sequence sentiment analysis lack interpretability， a Multi-Dynamic Aware Network （MultiDAN） for unaligned multimodal language sequence sentiment analysis was proposed. The core of MultiDAN was multi-layer and multi-angle extraction of dynamics. Firstly， Recurrent Neural Network （RNN） and attention mechanism were used to capture the dynamics within the modalities； secondly， intra- and inter-modal， long- and short-term dynamics were extracted at once using Graph Attention neTwork （GAT）； finally， the intra- and inter-modal dynamics of the nodes in the graph were extracted again using a special graph readout method to obtain a unique representation of the multimodal language sequence， and the sentiment score of the sequence was obtained by applying a MultiLayer Perceptron （MLP） classification. The experimental results on two commonly used publicly available datasets， CMU-MOSI and CMU-MOSEI， show that MultiDAN can fully extract the dynamics， and the F1 values of MultiDAN on the two unaligned datasets improve by 0.49 and 0.72 percentage points respectively， compared to the optimal Modal-Temporal Attention Graph （MTAG） in the comparison methods， which have high stability. MultiDAN can improve the performance of sentiment analysis for multimodal language sequences， and the Graph Neural Network （GNN） can effectively extract intra- and inter-modal dynamics.

Key words: sentiment analysis, multimodal language sequence, multimodal fusion, graph neural network, attention mechanism

中图分类号:

TP391.1

罗俊豪, 朱焱. 用于未对齐多模态语言序列情感分析的多交互感知网络[J]. 计算机应用, 2024, 44(1): 79-85.

Junhao LUO, Yan ZHU. Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis[J]. Journal of Computer Applications, 2024, 44(1): 79-85.

图/表 7

参考文献 20

1	GANDHI A， ADHVARYU K， PORIA S， et al. Multimodal sentiment analysis： A systematic review of history， datasets， multimodal fusion methods， applications， challenges and future directions ［J］. Information Fusion， 2023， 91： 424-444. 10.1016/j.inffus.2022.09.025
2	YANG J， WANG Y， YI R， et al. MTAG： Modal-temporal attention graph for unaligned human multimodal language sequences ［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： Association for Computational Linguistics， 2021： 1009-1021. 10.18653/v1/2021.naacl-main.79
3	WANG X， BO D， SHI C， et al. A survey on heterogeneous graph embedding： methods， techniques， applications and sources ［J］. IEEE Transactions on Big Data， 2022， 9（2）： 415-436. 10.1109/tbdata.2022.3177455
4	YANG T， HU L， SHI C， et al. Heterogeneous graph attention networks for semi-supervised short text classification ［J］. ACM Transactions on Information Systems， 2021， 39（3）： Article No. 32. 10.1145/3450352
5	VELIČKOVIĆ P， CUCURULL G， CASANOVA A， et al. Graph attention networks ［EB/OL］. （2018-02-04）［2023-08-24］. .
6	ZADEH A， CHEN M， PORIA S， et al. Tensor fusion network for multimodal sentiment analysis ［C］// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2017： 1103-1114. 10.18653/v1/d17-1115
7	ZADEH A， LIANG P P， MAZUMDER N， et al. Memory fusion network for multi-view sequential learning ［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 5634-5641. 10.1609/aaai.v32i1.12021
8	YANG B， SHAO B， WU L， et al. Multimodal sentiment analysis with unidirectional modality translation ［J］. Neurocomputing， 2022， 467（C）： 130-137. 10.1016/j.neucom.2021.09.041
9	TSAI Y-H H， BAI S， LIANG P P， et al. Multimodal transformer for unaligned multimodal language sequences ［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2019： 6558-6569. 10.18653/v1/p19-1656
10	LIAN Z， LIU B， TAO J. CTNet： Conversational transformer network for emotion recognition ［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2021， 29： 985-1000. 10.1109/taslp.2021.3049898
11	ZENG Y， LI Z， TANG Z， et al. Heterogeneous graph convolution based on In-domain self-supervision for multimodal sentiment analysis ［J］. Expert Systems with Applications， 2023， 213： 119240. 10.1016/j.eswa.2022.119240
12	LIN Z， LIANG B， LONG Y， et al. Modeling intra- and inter-modal relations： Hierarchical graph contrastive learning for multimodal sentiment analysis ［C］// Proceedings of the 29th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2022： 7124-7135.
13	WANG X， JI H， SHI C， et al. Heterogeneous graph attention network ［C］// Proceedings of the 2019 World Wide Web Conference. New York： ACM， 2019： 2022-2032. 10.1145/3308558.3313562
14	WU Z， JAIN P， WRIGHT M， et al. Representing long-range context for graph neural networks with global attention ［EB/OL］. ［2023-08-24］. .
15	ZADEH A， ZELLERS R， PINCUS E， et al. Multimodal sentiment intensity analysis in videos： facial gestures and verbal messages ［J］. IEEE Intelligent Systems， 2016， 31（6）： 82-88. 10.1109/mis.2016.94
16	ZADEH A B， LIANG P P， PORIA S， et al. Multimodal language analysis in the wild： CMU-MOSEI dataset and interpretable dynamic fusion graph ［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2018， 1： 2236-2246. 10.18653/v1/p18-1208
17	YU W， XU H， MENG F， et al. CH-SIMS： A Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality ［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2020： 3718-3727. 10.18653/v1/2020.acl-main.343
18	章荪，尹春勇.基于多任务学习的时序多模态情感分析模型［J］.计算机应用， 2021， 41（6）： 1631-1639.
	ZHANG S， YIN C Y. Sequential multimodal sentiment analysis model based on multi-task learning ［J］. Journal of Computer Applications， 2021， 41（6）： 1631-1639.
19	LIU Z， SHEN Y， LAKSHMINARASIMHAN V B， et al. Efficient low-rank multimodal fusion with modality-specific factors ［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2018， 1： 2247-2256. 10.18653/v1/p18-1209
20	BAEK J， KANG M， HWANG S J. Accurate learning of graph representations with graph multiset pooling ［EB/OL］. （2021-06-28）［2023-08-24］. .

数据集	样本数
数据集	训练集	验证集	测试集	总数
CMU-MOSI	1 283	229	686	2 198
CMU-MOSEI	16 326	1 871	4 659	22 856

数据集	样本数
数据集	训练集	验证集	测试集	总数
CMU-MOSI	1 283	229	686	2 198
CMU-MOSEI	16 326	1 871	4 659	22 856

数据集	是否对齐	方法	Acc7/%	Acc2/%	F1/%	MAE	Corr
CMU-MOSI	对齐	LF-DNN	34.66	78.70	78.65	94.63	66.36
		TFN	34.55	77.10	77.18	97.76	65.09
		LMF	35.01	78.69	78.68	94.41	67.19
		MulT	34.05	79.27	79.33	94.71	67.43
		MTAG	34.40	79.88	79.97	92.33	68.72
		MultiDAN	35.28	80.64	80.49	91.70	67.76
	未对齐	LF-DNN	34.55	79.60	79.51	93.08	66.57
		TFN	35.92	78.20	78.29	93.89	64.41
		LMF	34.43	78.69	78.78	94.77	66.73
		MulT	34.55	80.34	80.30	93.93	69.08
		MTAG	36.88	82.32	82.29	88.29	71.88
		MultiDAN	37.46	82.77	82.78	87.86	72.44
CMU-MOSEI	对齐	LF-DNN	52.04	82.64	82.31	56.34	73.06
		TFN	51.31	81.21	81.29	57.26	72.00
		LMF	51.82	83.14	83.14	56.90	73.16
		MulT	52.66	83.79	83.74	56.12	73.33
		MTAG	52.31	84.18	84.12	55.98	74.04
		MultiDAN	53.62	85.06	84.95	55.32	73.82
	未对齐	LF-DNN	51.66	83.65	83.11	56.16	72.91
		TFN	51.62	82.99	82.47	58.05	71.36
		LMF	52.14	83.67	83.61	56.40	73.16
		MulT	52.31	84.01	83.97	55.82	73.27
		MTAG	51.99	84.53	84.39	55.61	74.45
		MultiDAN	53.04	85.17	85.11	54.52	75.25

数据集	是否对齐	方法	Acc7/%	Acc2/%	F1/%	MAE	Corr
CMU-MOSI	对齐	LF-DNN	34.66	78.70	78.65	94.63	66.36
		TFN	34.55	77.10	77.18	97.76	65.09
		LMF	35.01	78.69	78.68	94.41	67.19
		MulT	34.05	79.27	79.33	94.71	67.43
		MTAG	34.40	79.88	79.97	92.33	68.72
		MultiDAN	35.28	80.64	80.49	91.70	67.76
	未对齐	LF-DNN	34.55	79.60	79.51	93.08	66.57
		TFN	35.92	78.20	78.29	93.89	64.41
		LMF	34.43	78.69	78.78	94.77	66.73
		MulT	34.55	80.34	80.30	93.93	69.08
		MTAG	36.88	82.32	82.29	88.29	71.88
		MultiDAN	37.46	82.77	82.78	87.86	72.44
CMU-MOSEI	对齐	LF-DNN	52.04	82.64	82.31	56.34	73.06
		TFN	51.31	81.21	81.29	57.26	72.00
		LMF	51.82	83.14	83.14	56.90	73.16
		MulT	52.66	83.79	83.74	56.12	73.33
		MTAG	52.31	84.18	84.12	55.98	74.04
		MultiDAN	53.62	85.06	84.95	55.32	73.82
	未对齐	LF-DNN	51.66	83.65	83.11	56.16	72.91
		TFN	51.62	82.99	82.47	58.05	71.36
		LMF	52.14	83.67	83.61	56.40	73.16
		MulT	52.31	84.01	83.97	55.82	73.27
		MTAG	51.99	84.53	84.39	55.61	74.45
		MultiDAN	53.04	85.17	85.11	54.52	75.25

实验对比内容	消融条件	Acc7/%	Acc2/%	F1/%	MAE	Corr
不同模态内交互信息提取方法	一维卷积	35.57	79.42	79.36	91.59	68.56
	BiGRU	35.13	80.79	80.87	94.68	66.64
	Transformer	32.07	80.64	80.72	91.78	68.73
	MHSA	36.44	80.03	80.05	94.22	64.39
注意力机制和位置嵌入的使用	无注意力	35.57	79.73	79.73	93.84	67.78
	无位置嵌入	37.03	79.88	79.91	91.28	66.69
	仅BiLSTM	35.57	78.96	79.07	96.09	65.89
不同的剪枝策略	随机80%	36.05	78.81	78.89	95.28	65.98
	TopK 60%	36.88	77.74	77.87	94.78	67.22
	不剪枝	36.01	78.96	79.05	93.97	65.95
图中边类型标识的有效性	仅时序标识	32.94	78.96	79.08	97.25	67.37
	仅模态标识	37.14	80.18	80.17	91.42	68.11
	无边标识	37.17	79.73	79.76	93.13	66.82
图的不同读出方法	读出并拼接	35.71	80.79	80.83	93.99	67.03
图的不同读出方法	GMT	36.59	80.03	80.10	94.30	66.38
多交互感知网络	MultiDAN	37.46	82.77	82.78	87.86	72.44

用于未对齐多模态语言序列情感分析的多交互感知网络

Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 20

相关文章 15

编辑推荐

Metrics

样例	真实极性	预测极性
文本：John Goodman absolutely amazing！	积极	积极
视觉：抬头、闭眼
音频：提高音量、肯定语气
文本：But regardless I thought it was a very good movie.	积极	积极
视觉：抬眉
音频：提高音量、赞扬语气
文本：That’s kind of crazy.	消极	消极
视觉：伸出双手、笑脸
音频：平静、笑声

[1]	史含笑, 王雷春. 结合LSTM和自注意力机制的图卷积网络短期电力负荷预测[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 311-317.
[2]	王红斌, 房晓, 江虹. 融入三维语义特征的常识推理问答方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 138-144.
[3]	陈佳, 张鸿. 基于特征增强和语义相关性匹配的图像文本检索方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 16-23.
[4]	赵强, 王中卿, 王红玲. 融合多模态信息的产品摘要抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 73-78.
[5]	陈丽安, 过弋. 融合个体偏差信息的文本情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 145-151.
[6]	王朱佳, 余宙, 俞俊, 范建平. 基于多尺度时空Transformer的视频动态场景图生成模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 47-57.
[7]	朱志平, 杨燕, 王杰. 基于场景图感知的跨模态图像描述模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 58-64.
[8]	王春雷, 王肖, 刘凯. 多模态知识图谱表示学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 1-15.
[9]	张秋余, 温永旺. 用于语音检索的三联体深度哈希方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2910-2918.
[10]	郑浩东, 马华, 谢颖超, 唐文胜. 融合遗忘因素与记忆门的图神经网络知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2747-2752.
[11]	杨昊, 张轶. 基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2727-2734.
[12]	马国帅, 钱宇华, 张亚宇, 李俊霞, 刘郭庆. 动态异构信息融合的科研合作潜力预测[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2775-2783.
[13]	袁国龙, 张玉金, 刘洋. 基于残差反馈和自注意力的图像篡改取证网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2925-2931.
[14]	王宏, 钱清, 王欢, 龙永. 融合大核注意力卷积的轻量化图像篡改定位算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2692-2699.
[15]	潘润超, 虞启山, 熊泓霏, 刘智慧. 基于深度图神经网络的协同推荐算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2741-2746.