Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (1): 65-72.DOI: 10.11772/j.issn.1001-9081.2022101527
• Cross-media representation learning and cognitive reasoning • Previous Articles Next Articles
					
						                                                                                                                                                                                                                                                    Yuxiang LIN1,2, Yunbing WU1,2( ), Aiying YIN3, Xiangwen LIAO1,2
), Aiying YIN3, Xiangwen LIAO1,2
												  
						
						
						
					
				
Received:2022-10-14
															
							
																	Revised:2023-02-08
															
							
																	Accepted:2023-02-14
															
							
							
																	Online:2023-04-12
															
							
																	Published:2024-01-10
															
							
						Contact:
								Yunbing WU   
													About author:LIN Yuxiang, born in 1998, M. S. candidate. His research interests include multimodal summarization, natural language processing.Supported by:
        
                   
            林于翔1,2, 吴运兵1,2( ), 阴爱英3, 廖祥文1,2
), 阴爱英3, 廖祥文1,2
                  
        
        
        
        
    
通讯作者:
					吴运兵
							作者简介:林于翔(1998—),男,福建平潭人,硕士研究生,主要研究方向:多模态摘要、自然语言处理;基金资助:CLC Number:
Yuxiang LIN, Yunbing WU, Aiying YIN, Xiangwen LIAO. Multi-modal summarization model based on semantic relevance analysis[J]. Journal of Computer Applications, 2024, 44(1): 65-72.
林于翔, 吴运兵, 阴爱英, 廖祥文. 基于语义相关性分析的多模态摘要模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 65-72.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022101527
| 数据集类别 | 句子-标题数 | 图片数 | 
|---|---|---|
| 训练集 | 62 000 | 62 000 | 
| 验证集 | 2 000 | 2 000 | 
| 测试集 | 2 000 | 2 000 | 
Tab. 1 Information of MMSS dataset
| 数据集类别 | 句子-标题数 | 图片数 | 
|---|---|---|
| 训练集 | 62 000 | 62 000 | 
| 验证集 | 2 000 | 2 000 | 
| 测试集 | 2 000 | 2 000 | 
| 参数名称 | 值 | 参数名称 | 值 | 
|---|---|---|---|
| 隐藏状态维度 | 512 | 初始学习率 | 0.000 5 | 
| 词嵌入维度 | 300 | 学习率衰减率 | 0.5 | 
| batch_size | 8 | Dropout | 0.2 | 
| 集束搜索的束宽大小 | 16 | 梯度裁剪 | 2.0 | 
Tab. 2 Experimental parameter settings of summary generation module
| 参数名称 | 值 | 参数名称 | 值 | 
|---|---|---|---|
| 隐藏状态维度 | 512 | 初始学习率 | 0.000 5 | 
| 词嵌入维度 | 300 | 学习率衰减率 | 0.5 | 
| batch_size | 8 | Dropout | 0.2 | 
| 集束搜索的束宽大小 | 16 | 梯度裁剪 | 2.0 | 
| 参数名称 | 值 | 参数名称 | 值 | 
|---|---|---|---|
| batch_size | 8 | warmup steps | 1 000 | 
| num_epoch | 8 | max_lr | 0.002 | 
Tab. 3 Experimental parameter settings of summary evaluation module
| 参数名称 | 值 | 参数名称 | 值 | 
|---|---|---|---|
| batch_size | 8 | warmup steps | 1 000 | 
| num_epoch | 8 | max_lr | 0.002 | 
| 模型 | ROUGE-1 | ROUGE-2 | ROUGE-L | 
|---|---|---|---|
| Compress[ | 31.56 | 11.02 | 28.87 | 
| ABS[ | 35.95 | 18.21 | 31.89 | 
| SEASS[ | 44.86 | 23.03 | 41.92 | 
| PGNet[ | 46.05 | 24.18 | 44.16 | 
| MAtt[ | 45.78 | 23.45 | 43.16 | 
| MPID[ | 48.11 | 24.70 | 44.96 | 
| MPMSE[ | 48.19 | 25.64 | 45.27 | 
| 本文模型 | 51.36 | 26.85 | 47.51 | 
Tab. 4 Experimental results on MMSS dataset
| 模型 | ROUGE-1 | ROUGE-2 | ROUGE-L | 
|---|---|---|---|
| Compress[ | 31.56 | 11.02 | 28.87 | 
| ABS[ | 35.95 | 18.21 | 31.89 | 
| SEASS[ | 44.86 | 23.03 | 41.92 | 
| PGNet[ | 46.05 | 24.18 | 44.16 | 
| MAtt[ | 45.78 | 23.45 | 43.16 | 
| MPID[ | 48.11 | 24.70 | 44.96 | 
| MPMSE[ | 48.19 | 25.64 | 45.27 | 
| 本文模型 | 51.36 | 26.85 | 47.51 | 
| 模型 | ROUGE-1 | ROUGE-2 | ROUGE-L | 
|---|---|---|---|
| 本文模型 | 51.36 | 26.85 | 47.51 | 
| w/o | 50.40 | 26.12 | 46.68 | 
| w/o | 48.79 | 25.94 | 45.72 | 
Tab. 5 Influence of removing different modules on experimental results
| 模型 | ROUGE-1 | ROUGE-2 | ROUGE-L | 
|---|---|---|---|
| 本文模型 | 51.36 | 26.85 | 47.51 | 
| w/o | 50.40 | 26.12 | 46.68 | 
| w/o | 48.79 | 25.94 | 45.72 | 
| 模型 | ROUGE-1 | ROUGE-2 | ROUGE-L | 
|---|---|---|---|
| 无摘要评估器 | 48.79 | 25.94 | 45.72 | 
| 摘要评估器 | 51.36 | 26.85 | 47.51 | 
| 摘要评估器 | 49.86 | 26.14 | 46.54 | 
Tab. 6 Experimental results of different summary evaluators
| 模型 | ROUGE-1 | ROUGE-2 | ROUGE-L | 
|---|---|---|---|
| 无摘要评估器 | 48.79 | 25.94 | 45.72 | 
| 摘要评估器 | 51.36 | 26.85 | 47.51 | 
| 摘要评估器 | 49.86 | 26.14 | 46.54 | 
| 模型 | ROUGE-1 | ROUGE-2 | ROUGE-L | 
|---|---|---|---|
| 本文模型 | 51.36 | 26.85 | 47.51 | 
| 50.40 | 26.12 | 46.68 | |
| 50.75 | 26.16 | 46.76 | |
| 50.46 | 25.72 | 46.32 | 
Tab. 7 Influence of removing visual global information of different modules on experimental results
| 模型 | ROUGE-1 | ROUGE-2 | ROUGE-L | 
|---|---|---|---|
| 本文模型 | 51.36 | 26.85 | 47.51 | 
| 50.40 | 26.12 | 46.68 | |
| 50.75 | 26.16 | 46.76 | |
| 50.46 | 25.72 | 46.32 | 
| 1 | SOLEYMANI M, GARCIA D, JOU B, et al. A survey of multimodal sentiment analysis [J]. Image and Vision Computing, 2017, 65(9): 3-14. 10.1016/j.imavis.2017.08.003 | 
| 2 | LI H, ZHU J, LIU T, et al. Multi-modal sentence summarization with modality attention and image filtering [C]// Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 4152-4158. 10.24963/ijcai.2018/577 | 
| 3 | LI H, ZHU J, ZHANG J, et al. Multimodal sentence summarization via multimodal selective encoding [C]// Proceedings of the 28th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2020: 5655-5667. 10.18653/v1/2020.coling-main.496 | 
| 4 | MIHALCEA R, TARAU P. TextRank: Bringing order into text [C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2004: 404-411. 10.3115/1220355.1220517 | 
| 5 | RUSH A M, CHOPRA S, WESTON J. A neural attention model for abstractive sentence summarization [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2015: 379-389. 10.18653/v1/d15-1044 | 
| 6 | CHOPRA S, AULI M, RUSH A M. Abstractive sentence summarization with attentive recurrent neural networks [C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2016: 93-98. 10.18653/v1/n16-1012 | 
| 7 | NALLAPATI R, ZHOU B, SANTOS C D, et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond [C]// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg, PA: Association for Computational Linguistics, 2016: 280-290. 10.18653/v1/k16-1028 | 
| 8 | GU J, LU Z, LI H, et al. Incorporating copying mechanism in sequence-to-sequence learning [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2016: 1631-1640. 10.18653/v1/p16-1154 | 
| 9 | SEE A, LIU P J, MANNING C D. Get to the point: Summarization with pointer-generator networks [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 1073-1083. 10.18653/v1/p17-1099 | 
| 10 | ZHU J, LI H, LIU T, et al. MSMO: Multimodal summarization with multimodal output [C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 4154-4164. 10.18653/v1/d18-1448 | 
| 11 | YE X, YUE Z, LIU R. MBA: A multimodal bilinear attention model with residual connection for abstractive multimodal summarization [J]. Journal of Physics: Conference Series, 2021, 1856: 012070. 10.1088/1742-6596/1856/1/012070 | 
| 12 | ZHANG Z, WANG J, SUN Z, et al. LAMS: A location-aware approach for multimodal summarization [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(18): 15949-15950. 10.1609/aaai.v35i18.17971 | 
| 13 | LIU Y, LIU P. SimCLS: A simple framework for contrastive learning of abstractive summarization [C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2021: 1065-1072. 10.18653/v1/2021.acl-short.135 | 
| 14 | PAULUS R, XIONG C, SOCHER R. A deep reinforced model for abstractive summarization [EB/OL]. [2022-10-01]. . | 
| 15 | LI S, LEI D, QIN P, et al. Deep reinforcement learning with distributional semantic rewards for abstractive summarization [C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 6038-6044. 10.18653/v1/d19-1623 | 
| 16 | SHEN W, GONG Y, SHEN Y, et al. Joint generator-ranker learning for natural language generation [EB/OL]. (2022-10-19) [2023-02-06]. . 10.18653/v1/2023.findings-acl.486 | 
| 17 | PAN H, LIN Z, FU P, et al. Modeling intra and inter-modality incongruity for multi-modal sarcasm detection [C]// Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA: Association for Computational Linguistics, 2020: 1383-1392. 10.18653/v1/2020.findings-emnlp.124 | 
| 18 | SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks [J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681. 10.1109/78.650093 | 
| 19 | BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [EB/OL]. [2022-10-01]. . 10.1017/9781108608480.003 | 
| 20 | HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780. 10.1162/neco.1997.9.8.1735 | 
| 21 | LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach [EB/OL]. (2019-06-26) [2023-02-06]. . | 
| 22 | 蔡中祥,孙建伟.融合指针网络的新闻文本摘要模型[J].小型微型计算机系统, 2021, 42(3): 462-466. 10.3969/j.issn.1000-1220.2021.03.003 | 
| CAI Z X, SUN J W. News text summarization model integrating pointer network [J]. Journal of Chinese Computer Systems, 2021, 42(3): 462-466. 10.3969/j.issn.1000-1220.2021.03.003 | |
| 23 | LI H, YUAN P, XU S, et al. Aspect-aware multimodal summarization for Chinese e-commerce products [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8188-8195. 10.1609/aaai.v34i05.6332 | 
| 24 | CLARKE J, LAPATA M. Global inference for sentence compression: An integer linear programming approach [J]. Journal of Artificial Intelligence Research, 2008, 31(1): 399-429. 10.1613/jair.2433 | 
| 25 | ZHOU Q, YANG N, WEI F, et al. Selective encoding for abstractive sentence summarization [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2017: 1095-1104. 10.18653/v1/p17-1101 | 
| [1] | Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN. Siamese mixed information fusion algorithm for RGBT tracking [J]. Journal of Computer Applications, 2024, 44(9): 2878-2885. | 
| [2] | Rui ZHANG, Pengyun ZHANG, Meirong GAO. Self-optimized dual-modal multi-channel non-deep vestibular schwannoma recognition model [J]. Journal of Computer Applications, 2024, 44(9): 2975-2982. | 
| [3] | Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831. | 
| [4] | Yirui HUANG, Junwei LUO, Jingqiang CHEN. Multi-modal dialog reply retrieval based on contrast learning and GIF tag [J]. Journal of Computer Applications, 2024, 44(1): 32-38. | 
| [5] | Jiaming HE, Jucheng YANG, Chao WU, Xiaoning YAN, Nenghua XU. Person re-identification method based on multi-modal graph convolutional neural network [J]. Journal of Computer Applications, 2023, 43(7): 2182-2189. | 
| [6] | Meng DOU, Zhebin CHEN, Xin WANG, Jitao ZHOU, Yu YAO. Review of multi-modal medical image segmentation based on deep learning [J]. Journal of Computer Applications, 2023, 43(11): 3385-3395. | 
| [7] | Na YU, Yan LIU, Xiongju WEI, Yuan WAN. Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion [J]. Journal of Computer Applications, 2022, 42(3): 844-853. | 
| [8] | Jie MENG, Li WANG, Yanjie YANG, Biao LIAN. Multi-modal deep fusion for false information detection [J]. Journal of Computer Applications, 2022, 42(2): 419-425. | 
| [9] | DONG Yang, PAN Haiwei, CUI Qianna, BIAN Xiaofei, TENG Teng, WANG Bangju. Few-shot segmentation method for multi-modal magnetic resonance images of brain tumor [J]. Journal of Computer Applications, 2021, 41(4): 1049-1054. | 
| [10] | WU Rui, LIU Yu, FENG Kai. Pedestrian attribute recognition based on two-domain self-attention mechanism [J]. Journal of Computer Applications, 2021, 41(2): 372-378. | 
| [11] | Wei CHEN, Yan YANG. Extractive and abstractive summarization model based on pointer-generator network [J]. Journal of Computer Applications, 2021, 41(12): 3527-3533. | 
| [12] | FU Ying, WANG Hongling, WANG Zhongqing. Scientific paper summarization model using macro discourse structure [J]. Journal of Computer Applications, 2021, 41(10): 2864-2870. | 
| [13] | TAN Jinyuan, DIAO Yufeng, QI Ruihua, LIN Hongfei. Automatic summary generation of Chinese news text based on BERT-PGN model [J]. Journal of Computer Applications, 2021, 41(1): 127-132. | 
| [14] | CHEN Hao, QIN Zhiguang, DING Yi. Multi-modal brain tumor segmentation method under same feature space [J]. Journal of Computer Applications, 2020, 40(7): 2104-2109. | 
| [15] | PI Jiatian, YANG Jiezhi, YANG Linxi, PENG Mingjie, DENG Xiong, ZHAO Lijun, TANG Wanmei, WU Zhiyou. Lightweight face liveness detection method based on multi-modal feature fusion [J]. Journal of Computer Applications, 2020, 40(12): 3658-3665. | 
| Viewed | ||||||
| Full text |  | |||||
| Abstract |  | |||||