Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (6): 1767-1775.DOI: 10.11772/j.issn.1001-9081.2025060731
• Artificial intelligence • Previous Articles
Yunping HE, Leichun WANG(
), Ruirui SONG, Xiangfeng LU, Jinxiang WEI, Xiaomeng LIU
Received:2025-07-02
Revised:2025-08-25
Accepted:2025-08-28
Online:2025-09-05
Published:2026-06-10
Contact:
Leichun WANG
About author:HE Yunping, born in 2000, M. S. candidate. His research interests include multimodal sentiment analysis, long time series prediction.Supported by:
何运平, 王雷春(
), 宋芮芮, 卢祥凤, 魏金香, 刘小萌
通讯作者:
王雷春
作者简介:何运平(2000—),男,湖北荆州人,硕士研究生,CCF会员,主要研究方向:多模态情感分析、长时间序列预测基金资助:CLC Number:
Yunping HE, Leichun WANG, Ruirui SONG, Xiangfeng LU, Jinxiang WEI, Xiaomeng LIU. Dual-channel multimodal sentiment analysis model based on contrast invariance and reinforcement specificity[J]. Journal of Computer Applications, 2026, 46(6): 1767-1775.
何运平, 王雷春, 宋芮芮, 卢祥凤, 魏金香, 刘小萌. 基于对比不变性和强化特定性的双通道多模态情感分析模型[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1767-1775.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025060731
| 参数 | 值 |
|---|---|
| batch_size | 16 |
| early_stop | 4 |
| nlevels | 4 |
| learning_rate | 0.000 1 |
| grad_clip | 0.6 |
Tab. 1 Experimental parameters
| 参数 | 值 |
|---|---|
| batch_size | 16 |
| early_stop | 4 |
| nlevels | 4 |
| learning_rate | 0.000 1 |
| grad_clip | 0.6 |
| 数据集 | 模型 | MAE | Acc-2/% | F1-score/% | Acc-7/% |
|---|---|---|---|---|---|
CMU- MOSI | MulT | 0.87 | 83.0 | 82.8 | 40.0 |
| MISA | 0.82 | 83.4 | 83.6 | 42.3 | |
| CubeMLP | 0.77 | 85.6 | 85.5 | 45.5 | |
| LMF | 0.92 | 82.5 | 82.4 | 33.2 | |
| MFM | 0.88 | 81.7 | 81.6 | 35.4 | |
| ICCN | 0.86 | 83.0 | 83.0 | 39.0 | |
| MAG-BERT | 0.73 | 84.4 | 84.6 | 43.6 | |
| Self-MM | 0.71 | 84.8 | 84.9 | 45.8 | |
| ALMT | 0.71 | 85.0 | 84.9 | 46.7 | |
| ConFEDE | 0.71 | 84.8 | 84.8 | 42.0 | |
| DLF | 0.73 | 85.1 | 85.0 | 47.1 | |
| CIRS | 0.70 | 86.2 | 86.1 | 47.8 | |
CMU- MOSEI | MulT | 0.58 | 82.5 | 82.3 | 51.8 |
| MISA | 0.56 | 83.4 | 83.6 | 42.3 | |
| CubeMLP | 0.52 | 85.1 | 84.5 | 52.9 | |
| LMF | 0.62 | 82.0 | 82.1 | 48.0 | |
| MFM | 0.57 | 84.4 | 84.3 | 51.3 | |
| ICCN | 0.57 | 84.2 | 84.2 | 51.6 | |
| MAG-BERT | 0.54 | 84.8 | 84.7 | 52.7 | |
| Self-MM | 0.53 | 85.0 | 85.0 | 53.5 | |
| ALMT | 0.54 | 84.5 | 84.5 | 53.2 | |
| ConFEDE | 0.55 | 84.8 | 84.6 | 53.0 | |
| DLF | 0.54 | 85.4 | 85.3 | 53.9 | |
| CIRS | 0.53 | 86.0 | 86.1 | 53.8 |
Tab.2 Comparison experiment results of different models on two datasets
| 数据集 | 模型 | MAE | Acc-2/% | F1-score/% | Acc-7/% |
|---|---|---|---|---|---|
CMU- MOSI | MulT | 0.87 | 83.0 | 82.8 | 40.0 |
| MISA | 0.82 | 83.4 | 83.6 | 42.3 | |
| CubeMLP | 0.77 | 85.6 | 85.5 | 45.5 | |
| LMF | 0.92 | 82.5 | 82.4 | 33.2 | |
| MFM | 0.88 | 81.7 | 81.6 | 35.4 | |
| ICCN | 0.86 | 83.0 | 83.0 | 39.0 | |
| MAG-BERT | 0.73 | 84.4 | 84.6 | 43.6 | |
| Self-MM | 0.71 | 84.8 | 84.9 | 45.8 | |
| ALMT | 0.71 | 85.0 | 84.9 | 46.7 | |
| ConFEDE | 0.71 | 84.8 | 84.8 | 42.0 | |
| DLF | 0.73 | 85.1 | 85.0 | 47.1 | |
| CIRS | 0.70 | 86.2 | 86.1 | 47.8 | |
CMU- MOSEI | MulT | 0.58 | 82.5 | 82.3 | 51.8 |
| MISA | 0.56 | 83.4 | 83.6 | 42.3 | |
| CubeMLP | 0.52 | 85.1 | 84.5 | 52.9 | |
| LMF | 0.62 | 82.0 | 82.1 | 48.0 | |
| MFM | 0.57 | 84.4 | 84.3 | 51.3 | |
| ICCN | 0.57 | 84.2 | 84.2 | 51.6 | |
| MAG-BERT | 0.54 | 84.8 | 84.7 | 52.7 | |
| Self-MM | 0.53 | 85.0 | 85.0 | 53.5 | |
| ALMT | 0.54 | 84.5 | 84.5 | 53.2 | |
| ConFEDE | 0.55 | 84.8 | 84.6 | 53.0 | |
| DLF | 0.54 | 85.4 | 85.3 | 53.9 | |
| CIRS | 0.53 | 86.0 | 86.1 | 53.8 |
| CMD | DE | FCE | CMU-MOSI | CMU-MOSEI | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MAE | Acc-2/% | F1-score/% | Acc-7/% | MAE | Acc-2/% | F1-score/% | Acc-7/% | |||
| × | × | × | 0.85 | 82.6 | 82.1 | 42.9 | 0.57 | 82.2 | 82.8 | 52.1 |
| √ | × | × | 0.80 | 84.7 | 84.7 | 45.5 | 0.54 | 84.5 | 84.5 | 53.4 |
| × | √ | × | 0.79 | 84.6 | 84.5 | 45.8 | 0.55 | 84.4 | 84.4 | 53.5 |
| × | × | √ | 0.80 | 84.5 | 84.6 | 44.7 | 0.57 | 84.3 | 84.3 | 53.0 |
| √ | √ | × | 0.75 | 85.3 | 85.3 | 46.8 | 0.54 | 85.5 | 85.6 | 53.6 |
| √ | × | √ | 0.72 | 85.6 | 85.6 | 46.5 | 0.54 | 85.5 | 85.5 | 53.4 |
| × | √ | √ | 0.74 | 85.2 | 85.2 | 46.3 | 0.55 | 85.1 | 85.1 | 53.6 |
| √ | √ | √ | 0.70 | 86.2 | 86.1 | 47.8 | 0.53 | 86.0 | 86.1 | 53.8 |
Tab.3 Results of ablation experiments
| CMD | DE | FCE | CMU-MOSI | CMU-MOSEI | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MAE | Acc-2/% | F1-score/% | Acc-7/% | MAE | Acc-2/% | F1-score/% | Acc-7/% | |||
| × | × | × | 0.85 | 82.6 | 82.1 | 42.9 | 0.57 | 82.2 | 82.8 | 52.1 |
| √ | × | × | 0.80 | 84.7 | 84.7 | 45.5 | 0.54 | 84.5 | 84.5 | 53.4 |
| × | √ | × | 0.79 | 84.6 | 84.5 | 45.8 | 0.55 | 84.4 | 84.4 | 53.5 |
| × | × | √ | 0.80 | 84.5 | 84.6 | 44.7 | 0.57 | 84.3 | 84.3 | 53.0 |
| √ | √ | × | 0.75 | 85.3 | 85.3 | 46.8 | 0.54 | 85.5 | 85.6 | 53.6 |
| √ | × | √ | 0.72 | 85.6 | 85.6 | 46.5 | 0.54 | 85.5 | 85.5 | 53.4 |
| × | √ | √ | 0.74 | 85.2 | 85.2 | 46.3 | 0.55 | 85.1 | 85.1 | 53.6 |
| √ | √ | √ | 0.70 | 86.2 | 86.1 | 47.8 | 0.53 | 86.0 | 86.1 | 53.8 |
| [1] | ARABIAN H, BATTISTEL A, CHASE J G, et al. Attention-guided network model for image-based emotion recognition[J]. Applied Sciences, 2023, 13(18): No.10179. |
| [2] | KAUSHIK L, SANGWAN A, HANSEN J H L. Automatic sentiment detection in naturalistic audio[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(8): 1668-1679. |
| [3] | YU J, CHEN K, XIA R. Hierarchical interactive multimodal transformer for aspect-based multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2023, 14(3): 1966-1978. |
| [4] | KIM Y. Convolutional neural network for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2014: 1746-1751. |
| [5] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. |
| [6] | SCHULLER B, VLASENKO B, EYBEN F, et al. Cross-corpus acoustic emotion recognition: variances and strategies[J]. IEEE Transactions on Affective Computing, 2010, 1(2): 119-131. |
| [7] | TRIGEORGIS G, RINGEVAL F, BRUECKNER R, et al. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network[C]// Proceedings of the 2016 International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2016: 5200-5204. |
| [8] | BAEVSKI A, ZHOU Y, MOHAMED A, et al. wav2vec 2.0: a framework for self-supervised learning of speech representations[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 12449-12460. |
| [9] | MACHAJDIK J, HANBURY A. Affective image classification using features inspired by psychology and art theory[C]// Proceedings of the 18th ACM International Conference on Multimedia. New York: ACM, 2010: 83-92. |
| [10] | YOU Q, LUO J, JIN H, et al. Robust image sentiment analysis using progressively trained and domain transferred deep networks[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 381-388. |
| [11] | MOLLAHOSSEINI A, HASANI B, MAHOOR M H. AffectNet: a database for facial expression, valence, and arousal computing in the wild[J]. IEEE Transactions on Affective Computing, 2019, 10(1): 18-31. |
| [12] | LI S, DENG W. Deep facial expression recognition: a survey[J]. IEEE Transactions on Affective Computing, 2022, 13(3): 1195-1215. |
| [13] | JIANG D, CUI Y, ZHANG X, et al. Audio visual emotion recognition based on triple-stream dynamic Bayesian network models[C]// Proceedings of the 2011 International Conference on Affective Computing and Intelligent Interaction, LNCS 6974. Berlin: Springer, 2011: 609-618. |
| [14] | ABBURI H, PRASATH R, SHRIVASTAVA M, et al. Multimodal sentiment analysis using deep neural networks[C]// Proceedings of the 2016 International Conference on Mining Intelligence and Knowledge Exploration, LNCS 10089. Cham: Springer, 2017: 58-65. |
| [15] | LIN H, ZHANG P, LING J, et al. PS-Mixer: a polar-vector and strength-vector mixer model for multimodal sentiment analysis [J]. Information Processing and Management, 2023, 60(2): No.103229. |
| [16] | HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and -specific representations for multimodal sentiment analysis[C]// Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1122-1131. |
| [17] | 郭小宇,马静. 基于SEFusion-MPOR的多模态特征融合舆情表征算法[J]. 情报理论与实践, 2024, 47(7): 181-189. |
| GUO X Y, MA J. Multimodal feature fusion public opinion representation algorithm based on SEFusion-MPOR[J]. Information Studies: Theory and Application, 2024, 47(7): 181-189. | |
| [18] | LI Y, WANG Y, CUI Z. Decoupled multimodal distilling for emotion recognition[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 6631-6640. |
| [19] | GUO Z, JIN T, ZHAO Z. Multimodal prompt learning with missing modalities for sentiment analysis and emotion recognition[C]// Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2024: 1726-1736. |
| [20] | DU J, JIN J, ZHUANG J, et al. Hierarchical graph contrastive learning of local and global presentation for multimodal sentiment analysis[J]. Scientific Reports, 2024, 14: No.5335. |
| [21] | 宗林林,周佳慧,谢秋婕,等. 基于超图的多模态情绪识别[J]. 计算机学报, 2023, 46(12): 2520-2534. |
| ZONG L L, ZHOU J H, XIE Q J, et al. Multi-modal emotion recognition based on hypergraph[J]. Chinese Journal of Computers, 2023, 46(12): 2520-2534. | |
| [22] | WENG Y, WANG H, GAO T, et al. Enhancing multimodal sentiment analysis for missing modality through self-distillation and unified modality cross-attention[C]// Proceedings of the 2025 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2025: 1-5. |
| [23] | BALTRUŠAITIS T, ROBINSON P, MORENCY L P. OpenFace: an open source facial behavior analysis toolkit[C]// Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2016: 1-10. |
| [24] | DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP: a collaborative voice analysis repository for speech technologies[C]// Proceedings of the 2014 International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2014: 960-964. |
| [25] | ZADEH A, ZELLERS R, PINCUS E, et al. Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages[J]. IEEE Intelligent Systems, 2016, 31(6): 82-88. |
| [26] | ZADEH A, LIANG P P, VANBRIESEN J, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2018: 2236-2246. |
| [27] | TSAI Y H H, BAI S, LIANG P P, et al. Multimodal Transformer for unaligned multimodal language sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 6558-6569. |
| [28] | SUN H, WANG H, LIU J, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]// Proceedings of the 30th ACM International Conference on Multimedia. New York: ACM, 2022: 3722-3729. |
| [29] | LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low‑rank multimodal fusion with modality‑specific factors[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2018: 2247-2256. |
| [30] | TSAI Y H H, LIANG P P, ZADEH A, et al. Learning factorized multimodal representations[EB/OL]. [2025-04-21].. |
| [31] | SUN Z, SARMA P K, SETHARES W A, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 8992-8999. |
| [32] | RAHMAN W, HASAN M K, LEE S, et al. Integrating multimodal information in large pretrained Transformers[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 2359-2369. |
| [33] | YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 10790-10797. |
| [34] | ZHANG H, WANG Y, YIN G, et al. Learning language-guided adaptive hyper-modality representation for multimodal sentiment analysis[C]// Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 756‑767. |
| [35] | YANG J, YU Y, NIU D, et al. ConFEDE: contrastive feature decomposition for multimodal sentiment analysis[C]// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2023: 7617-7630. |
| [36] | WANG P, ZHOU Q, WU Y, et al. DLF: disentangled-language- focused multimodal sentiment analysis[C]// Proceedings of the 39th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2025: 21180-21188. |
| [1] | Qianfei WANG, Yang LI, Deyu LI, Suge WANG. Dual-channel feature fusion representation method for short-text clustering based on large language model [J]. Journal of Computer Applications, 2026, 46(5): 1441-1449. |
| [2] | Ruirui SONG, Leichun WANG, Yunping HE, Jinxiang WEI, Xiangfeng LU, Xiaomeng LIU. Long time series prediction based on hybrid self-attention and differentiated normalization [J]. Journal of Computer Applications, 2026, 46(5): 1499-1506. |
| [3] | Hu LUO, Mingshu ZHANG. Rumor detection method based on cross-modal attention mechanism and contrastive learning [J]. Journal of Computer Applications, 2026, 46(2): 361-367. |
| [4] | Xiang WANG, Zhixiang CHEN, Guojun MAO. Multivariate time series prediction method combining local and global correlation [J]. Journal of Computer Applications, 2025, 45(9): 2806-2816. |
| [5] | Xiaoqiang ZHAO, Yongyong LIU, Yongyong HUI, Kai LIU. Batch process quality prediction model using improved time-domain convolutional network with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2025, 45(7): 2245-2252. |
| [6] | Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU. Source code vulnerability detection method based on Transformer-GCN [J]. Journal of Computer Applications, 2025, 45(7): 2296-2303. |
| [7] | Yihan WANG, Chong LU, Zhongyuan CHEN. Multimodal sentiment analysis model with cross-modal text information enhancement [J]. Journal of Computer Applications, 2025, 45(7): 2237-2244. |
| [8] | Hui LI, Bingzhi JIA, Chenxi WANG, Ziyu DONG, Jilong LI, Zhaoman ZHONG, Yanyan CHEN. Generative adversarial network underwater image enhancement model based on Swin Transformer [J]. Journal of Computer Applications, 2025, 45(5): 1439-1446. |
| [9] | Kunyuan JIANG, Xiaoxia LI, Li WANG, Yaodan CAO, Xiaoqiang ZHANG, Nan DING, Yingyue ZHOU. Boundary-cross supervised semantic segmentation network with decoupled residual self-attention [J]. Journal of Computer Applications, 2025, 45(4): 1120-1129. |
| [10] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
| [11] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
| [12] | Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831. |
| [13] | Yue LIU, Fang LIU, Aoyun WU, Qiuyue CHAI, Tianxiao WANG. 3D object detection network based on self-attention mechanism and graph convolution [J]. Journal of Computer Applications, 2024, 44(6): 1972-1977. |
| [14] | Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer [J]. Journal of Computer Applications, 2024, 44(4): 1269-1276. |
| [15] | Ziqi HUANG, Jianpeng HU. Entity category enhanced nested named entity recognition in automotive domain [J]. Journal of Computer Applications, 2024, 44(2): 377-384. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||