Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (1): 79-85.DOI: 10.11772/j.issn.1001-9081.2023060815
• Cross-media representation learning and cognitive reasoning • Previous Articles Next Articles
Junhao LUO1, Yan ZHU2()
Received:
2023-06-26
Revised:
2023-09-12
Accepted:
2023-09-13
Online:
2023-09-20
Published:
2024-01-10
Contact:
Yan ZHU
About author:
LUO Junhao, born in 1999, M. S. candidate. His research interests include multimodal data mining, sentiment analysis.
Supported by:
通讯作者:
朱焱
作者简介:
罗俊豪(1999—),男,四川成都人,硕士研究生,主要研究方向:多模态数据挖掘、情感分析;基金资助:
CLC Number:
Junhao LUO, Yan ZHU. Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis[J]. Journal of Computer Applications, 2024, 44(1): 79-85.
罗俊豪, 朱焱. 用于未对齐多模态语言序列情感分析的多交互感知网络[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 79-85.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023060815
数据集 | 样本数 | |||
---|---|---|---|---|
训练集 | 验证集 | 测试集 | 总数 | |
CMU-MOSI | 1 283 | 229 | 686 | 2 198 |
CMU-MOSEI | 16 326 | 1 871 | 4 659 | 22 856 |
Tab. 1 Statistics of CMU-MOSI and CMU-MOSEI datasets
数据集 | 样本数 | |||
---|---|---|---|---|
训练集 | 验证集 | 测试集 | 总数 | |
CMU-MOSI | 1 283 | 229 | 686 | 2 198 |
CMU-MOSEI | 16 326 | 1 871 | 4 659 | 22 856 |
数据集 | 是否 对齐 | 方法 | Acc7/% | Acc2/% | F1/% | MAE | Corr |
---|---|---|---|---|---|---|---|
CMU-MOSI | 对齐 | LF-DNN | 34.66 | 78.70 | 78.65 | 94.63 | 66.36 |
TFN | 34.55 | 77.10 | 77.18 | 97.76 | 65.09 | ||
LMF | 35.01 | 78.69 | 78.68 | 94.41 | 67.19 | ||
MulT | 34.05 | 79.27 | 79.33 | 94.71 | 67.43 | ||
MTAG | 34.40 | 79.88 | 79.97 | 92.33 | 68.72 | ||
MultiDAN | 35.28 | 80.64 | 80.49 | 91.70 | 67.76 | ||
未对齐 | LF-DNN | 34.55 | 79.60 | 79.51 | 93.08 | 66.57 | |
TFN | 35.92 | 78.20 | 78.29 | 93.89 | 64.41 | ||
LMF | 34.43 | 78.69 | 78.78 | 94.77 | 66.73 | ||
MulT | 34.55 | 80.34 | 80.30 | 93.93 | 69.08 | ||
MTAG | 36.88 | 82.32 | 82.29 | 88.29 | 71.88 | ||
MultiDAN | 37.46 | 82.77 | 82.78 | 87.86 | 72.44 | ||
CMU-MOSEI | 对齐 | LF-DNN | 52.04 | 82.64 | 82.31 | 56.34 | 73.06 |
TFN | 51.31 | 81.21 | 81.29 | 57.26 | 72.00 | ||
LMF | 51.82 | 83.14 | 83.14 | 56.90 | 73.16 | ||
MulT | 52.66 | 83.79 | 83.74 | 56.12 | 73.33 | ||
MTAG | 52.31 | 84.18 | 84.12 | 55.98 | 74.04 | ||
MultiDAN | 53.62 | 85.06 | 84.95 | 55.32 | 73.82 | ||
未对齐 | LF-DNN | 51.66 | 83.65 | 83.11 | 56.16 | 72.91 | |
TFN | 51.62 | 82.99 | 82.47 | 58.05 | 71.36 | ||
LMF | 52.14 | 83.67 | 83.61 | 56.40 | 73.16 | ||
MulT | 52.31 | 84.01 | 83.97 | 55.82 | 73.27 | ||
MTAG | 51.99 | 84.53 | 84.39 | 55.61 | 74.45 | ||
MultiDAN | 53.04 | 85.17 | 85.11 | 54.52 | 75.25 |
Tab. 2 Experimental results on CMU-MOSI and CMU-MOSEI datasets
数据集 | 是否 对齐 | 方法 | Acc7/% | Acc2/% | F1/% | MAE | Corr |
---|---|---|---|---|---|---|---|
CMU-MOSI | 对齐 | LF-DNN | 34.66 | 78.70 | 78.65 | 94.63 | 66.36 |
TFN | 34.55 | 77.10 | 77.18 | 97.76 | 65.09 | ||
LMF | 35.01 | 78.69 | 78.68 | 94.41 | 67.19 | ||
MulT | 34.05 | 79.27 | 79.33 | 94.71 | 67.43 | ||
MTAG | 34.40 | 79.88 | 79.97 | 92.33 | 68.72 | ||
MultiDAN | 35.28 | 80.64 | 80.49 | 91.70 | 67.76 | ||
未对齐 | LF-DNN | 34.55 | 79.60 | 79.51 | 93.08 | 66.57 | |
TFN | 35.92 | 78.20 | 78.29 | 93.89 | 64.41 | ||
LMF | 34.43 | 78.69 | 78.78 | 94.77 | 66.73 | ||
MulT | 34.55 | 80.34 | 80.30 | 93.93 | 69.08 | ||
MTAG | 36.88 | 82.32 | 82.29 | 88.29 | 71.88 | ||
MultiDAN | 37.46 | 82.77 | 82.78 | 87.86 | 72.44 | ||
CMU-MOSEI | 对齐 | LF-DNN | 52.04 | 82.64 | 82.31 | 56.34 | 73.06 |
TFN | 51.31 | 81.21 | 81.29 | 57.26 | 72.00 | ||
LMF | 51.82 | 83.14 | 83.14 | 56.90 | 73.16 | ||
MulT | 52.66 | 83.79 | 83.74 | 56.12 | 73.33 | ||
MTAG | 52.31 | 84.18 | 84.12 | 55.98 | 74.04 | ||
MultiDAN | 53.62 | 85.06 | 84.95 | 55.32 | 73.82 | ||
未对齐 | LF-DNN | 51.66 | 83.65 | 83.11 | 56.16 | 72.91 | |
TFN | 51.62 | 82.99 | 82.47 | 58.05 | 71.36 | ||
LMF | 52.14 | 83.67 | 83.61 | 56.40 | 73.16 | ||
MulT | 52.31 | 84.01 | 83.97 | 55.82 | 73.27 | ||
MTAG | 51.99 | 84.53 | 84.39 | 55.61 | 74.45 | ||
MultiDAN | 53.04 | 85.17 | 85.11 | 54.52 | 75.25 |
实验对比内容 | 消融条件 | Acc7/% | Acc2/% | F1/% | MAE | Corr |
---|---|---|---|---|---|---|
不同模态内交互信息提取方法 | 一维卷积 | 35.57 | 79.42 | 79.36 | 91.59 | 68.56 |
BiGRU | 35.13 | 80.79 | 80.87 | 94.68 | 66.64 | |
Transformer | 32.07 | 80.64 | 80.72 | 91.78 | 68.73 | |
MHSA | 36.44 | 80.03 | 80.05 | 94.22 | 64.39 | |
注意力机制和位置嵌入的使用 | 无注意力 | 35.57 | 79.73 | 79.73 | 93.84 | 67.78 |
无位置嵌入 | 37.03 | 79.88 | 79.91 | 91.28 | 66.69 | |
仅BiLSTM | 35.57 | 78.96 | 79.07 | 96.09 | 65.89 | |
不同的剪枝策略 | 随机80% | 36.05 | 78.81 | 78.89 | 95.28 | 65.98 |
TopK 60% | 36.88 | 77.74 | 77.87 | 94.78 | 67.22 | |
不剪枝 | 36.01 | 78.96 | 79.05 | 93.97 | 65.95 | |
图中边类型标识的有效性 | 仅时序标识 | 32.94 | 78.96 | 79.08 | 97.25 | 67.37 |
仅模态标识 | 37.14 | 80.18 | 80.17 | 91.42 | 68.11 | |
无边标识 | 37.17 | 79.73 | 79.76 | 93.13 | 66.82 | |
图的不同读出方法 | 读出并拼接 | 35.71 | 80.79 | 80.83 | 93.99 | 67.03 |
GMT | 36.59 | 80.03 | 80.10 | 94.30 | 66.38 | |
多交互感知网络 | MultiDAN | 37.46 | 82.77 | 82.78 | 87.86 | 72.44 |
Tab. 3 Ablation experiment results on unaligned CMU-MOSI dataset
实验对比内容 | 消融条件 | Acc7/% | Acc2/% | F1/% | MAE | Corr |
---|---|---|---|---|---|---|
不同模态内交互信息提取方法 | 一维卷积 | 35.57 | 79.42 | 79.36 | 91.59 | 68.56 |
BiGRU | 35.13 | 80.79 | 80.87 | 94.68 | 66.64 | |
Transformer | 32.07 | 80.64 | 80.72 | 91.78 | 68.73 | |
MHSA | 36.44 | 80.03 | 80.05 | 94.22 | 64.39 | |
注意力机制和位置嵌入的使用 | 无注意力 | 35.57 | 79.73 | 79.73 | 93.84 | 67.78 |
无位置嵌入 | 37.03 | 79.88 | 79.91 | 91.28 | 66.69 | |
仅BiLSTM | 35.57 | 78.96 | 79.07 | 96.09 | 65.89 | |
不同的剪枝策略 | 随机80% | 36.05 | 78.81 | 78.89 | 95.28 | 65.98 |
TopK 60% | 36.88 | 77.74 | 77.87 | 94.78 | 67.22 | |
不剪枝 | 36.01 | 78.96 | 79.05 | 93.97 | 65.95 | |
图中边类型标识的有效性 | 仅时序标识 | 32.94 | 78.96 | 79.08 | 97.25 | 67.37 |
仅模态标识 | 37.14 | 80.18 | 80.17 | 91.42 | 68.11 | |
无边标识 | 37.17 | 79.73 | 79.76 | 93.13 | 66.82 | |
图的不同读出方法 | 读出并拼接 | 35.71 | 80.79 | 80.83 | 93.99 | 67.03 |
GMT | 36.59 | 80.03 | 80.10 | 94.30 | 66.38 | |
多交互感知网络 | MultiDAN | 37.46 | 82.77 | 82.78 | 87.86 | 72.44 |
样例 | 真实极性 | 预测极性 |
---|---|---|
文本:John Goodman absolutely amazing! | 积极 | 积极 |
视觉:抬头、闭眼 | ||
音频:提高音量、肯定语气 | ||
文本:But regardless I thought it was a very good movie. | 积极 | 积极 |
视觉:抬眉 | ||
音频:提高音量、赞扬语气 | ||
文本:That’s kind of crazy. | 消极 | 消极 |
视觉:伸出双手、笑脸 | ||
音频:平静、笑声 |
Tab. 4 Case learning results
样例 | 真实极性 | 预测极性 |
---|---|---|
文本:John Goodman absolutely amazing! | 积极 | 积极 |
视觉:抬头、闭眼 | ||
音频:提高音量、肯定语气 | ||
文本:But regardless I thought it was a very good movie. | 积极 | 积极 |
视觉:抬眉 | ||
音频:提高音量、赞扬语气 | ||
文本:That’s kind of crazy. | 消极 | 消极 |
视觉:伸出双手、笑脸 | ||
音频:平静、笑声 |
1 | GANDHI A, ADHVARYU K, PORIA S, et al. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions [J]. Information Fusion, 2023, 91: 424-444. 10.1016/j.inffus.2022.09.025 |
2 | YANG J, WANG Y, YI R, et al. MTAG: Modal-temporal attention graph for unaligned human multimodal language sequences [C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2021: 1009-1021. 10.18653/v1/2021.naacl-main.79 |
3 | WANG X, BO D, SHI C, et al. A survey on heterogeneous graph embedding: methods, techniques, applications and sources [J]. IEEE Transactions on Big Data, 2022, 9(2): 415-436. 10.1109/tbdata.2022.3177455 |
4 | YANG T, HU L, SHI C, et al. Heterogeneous graph attention networks for semi-supervised short text classification [J]. ACM Transactions on Information Systems, 2021, 39(3): Article No. 32. 10.1145/3450352 |
5 | VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks [EB/OL]. (2018-02-04) [2023-08-24]. . |
6 | ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis [C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2017: 1103-1114. 10.18653/v1/d17-1115 |
7 | ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning [C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 5634-5641. 10.1609/aaai.v32i1.12021 |
8 | YANG B, SHAO B, WU L, et al. Multimodal sentiment analysis with unidirectional modality translation [J]. Neurocomputing, 2022, 467(C): 130-137. 10.1016/j.neucom.2021.09.041 |
9 | TSAI Y-H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences [C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 6558-6569. 10.18653/v1/p19-1656 |
10 | LIAN Z, LIU B, TAO J. CTNet: Conversational transformer network for emotion recognition [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 985-1000. 10.1109/taslp.2021.3049898 |
11 | ZENG Y, LI Z, TANG Z, et al. Heterogeneous graph convolution based on In-domain self-supervision for multimodal sentiment analysis [J]. Expert Systems with Applications, 2023, 213: 119240. 10.1016/j.eswa.2022.119240 |
12 | LIN Z, LIANG B, LONG Y, et al. Modeling intra- and inter-modal relations: Hierarchical graph contrastive learning for multimodal sentiment analysis [C]// Proceedings of the 29th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2022: 7124-7135. |
13 | WANG X, JI H, SHI C, et al. Heterogeneous graph attention network [C]// Proceedings of the 2019 World Wide Web Conference. New York: ACM, 2019: 2022-2032. 10.1145/3308558.3313562 |
14 | WU Z, JAIN P, WRIGHT M, et al. Representing long-range context for graph neural networks with global attention [EB/OL]. [2023-08-24]. . |
15 | ZADEH A, ZELLERS R, PINCUS E, et al. Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages [J]. IEEE Intelligent Systems, 2016, 31(6): 82-88. 10.1109/mis.2016.94 |
16 | ZADEH A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph [C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018, 1: 2236-2246. 10.18653/v1/p18-1208 |
17 | YU W, XU H, MENG F, et al. CH-SIMS: A Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 3718-3727. 10.18653/v1/2020.acl-main.343 |
18 | 章荪,尹春勇.基于多任务学习的时序多模态情感分析模型[J].计算机应用, 2021, 41(6): 1631-1639. |
ZHANG S, YIN C Y. Sequential multimodal sentiment analysis model based on multi-task learning [J]. Journal of Computer Applications, 2021, 41(6): 1631-1639. | |
19 | LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors [C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018, 1: 2247-2256. 10.18653/v1/p18-1209 |
20 | BAEK J, KANG M, HWANG S J. Accurate learning of graph representations with graph multiset pooling [EB/OL]. (2021-06-28) [2023-08-24]. . |
[1] | Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718. |
[2] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[3] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[4] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[5] | Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725. |
[6] | Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731. |
[7] | Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710. |
[8] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[9] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[10] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[11] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
[12] | Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642. |
[13] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
[14] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[15] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||