Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (1): 65-72.DOI: 10.11772/j.issn.1001-9081.2022101527
• Cross-media representation learning and cognitive reasoning • Previous Articles Next Articles
Yuxiang LIN1,2, Yunbing WU1,2(), Aiying YIN3, Xiangwen LIAO1,2
Received:
2022-10-14
Revised:
2023-02-08
Accepted:
2023-02-14
Online:
2023-04-12
Published:
2024-01-10
Contact:
Yunbing WU
About author:
LIN Yuxiang, born in 1998, M. S. candidate. His research interests include multimodal summarization, natural language processing.Supported by:
林于翔1,2, 吴运兵1,2(), 阴爱英3, 廖祥文1,2
通讯作者:
吴运兵
作者简介:
林于翔(1998—),男,福建平潭人,硕士研究生,主要研究方向:多模态摘要、自然语言处理;基金资助:
CLC Number:
Yuxiang LIN, Yunbing WU, Aiying YIN, Xiangwen LIAO. Multi-modal summarization model based on semantic relevance analysis[J]. Journal of Computer Applications, 2024, 44(1): 65-72.
林于翔, 吴运兵, 阴爱英, 廖祥文. 基于语义相关性分析的多模态摘要模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 65-72.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022101527
数据集类别 | 句子-标题数 | 图片数 |
---|---|---|
训练集 | 62 000 | 62 000 |
验证集 | 2 000 | 2 000 |
测试集 | 2 000 | 2 000 |
Tab. 1 Information of MMSS dataset
数据集类别 | 句子-标题数 | 图片数 |
---|---|---|
训练集 | 62 000 | 62 000 |
验证集 | 2 000 | 2 000 |
测试集 | 2 000 | 2 000 |
参数名称 | 值 | 参数名称 | 值 |
---|---|---|---|
隐藏状态维度 | 512 | 初始学习率 | 0.000 5 |
词嵌入维度 | 300 | 学习率衰减率 | 0.5 |
batch_size | 8 | Dropout | 0.2 |
集束搜索的束宽大小 | 16 | 梯度裁剪 | 2.0 |
Tab. 2 Experimental parameter settings of summary generation module
参数名称 | 值 | 参数名称 | 值 |
---|---|---|---|
隐藏状态维度 | 512 | 初始学习率 | 0.000 5 |
词嵌入维度 | 300 | 学习率衰减率 | 0.5 |
batch_size | 8 | Dropout | 0.2 |
集束搜索的束宽大小 | 16 | 梯度裁剪 | 2.0 |
参数名称 | 值 | 参数名称 | 值 |
---|---|---|---|
batch_size | 8 | warmup steps | 1 000 |
num_epoch | 8 | max_lr | 0.002 |
Tab. 3 Experimental parameter settings of summary evaluation module
参数名称 | 值 | 参数名称 | 值 |
---|---|---|---|
batch_size | 8 | warmup steps | 1 000 |
num_epoch | 8 | max_lr | 0.002 |
模型 | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
Compress[ | 31.56 | 11.02 | 28.87 |
ABS[ | 35.95 | 18.21 | 31.89 |
SEASS[ | 44.86 | 23.03 | 41.92 |
PGNet[ | 46.05 | 24.18 | 44.16 |
MAtt[ | 45.78 | 23.45 | 43.16 |
MPID[ | 48.11 | 24.70 | 44.96 |
MPMSE[ | 48.19 | 25.64 | 45.27 |
本文模型 | 51.36 | 26.85 | 47.51 |
Tab. 4 Experimental results on MMSS dataset
模型 | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
Compress[ | 31.56 | 11.02 | 28.87 |
ABS[ | 35.95 | 18.21 | 31.89 |
SEASS[ | 44.86 | 23.03 | 41.92 |
PGNet[ | 46.05 | 24.18 | 44.16 |
MAtt[ | 45.78 | 23.45 | 43.16 |
MPID[ | 48.11 | 24.70 | 44.96 |
MPMSE[ | 48.19 | 25.64 | 45.27 |
本文模型 | 51.36 | 26.85 | 47.51 |
模型 | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
本文模型 | 51.36 | 26.85 | 47.51 |
w/o | 50.40 | 26.12 | 46.68 |
w/o | 48.79 | 25.94 | 45.72 |
Tab. 5 Influence of removing different modules on experimental results
模型 | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
本文模型 | 51.36 | 26.85 | 47.51 |
w/o | 50.40 | 26.12 | 46.68 |
w/o | 48.79 | 25.94 | 45.72 |
模型 | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
无摘要评估器 | 48.79 | 25.94 | 45.72 |
摘要评估器 | 51.36 | 26.85 | 47.51 |
摘要评估器 | 49.86 | 26.14 | 46.54 |
Tab. 6 Experimental results of different summary evaluators
模型 | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
无摘要评估器 | 48.79 | 25.94 | 45.72 |
摘要评估器 | 51.36 | 26.85 | 47.51 |
摘要评估器 | 49.86 | 26.14 | 46.54 |
模型 | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
本文模型 | 51.36 | 26.85 | 47.51 |
50.40 | 26.12 | 46.68 | |
50.75 | 26.16 | 46.76 | |
50.46 | 25.72 | 46.32 |
Tab. 7 Influence of removing visual global information of different modules on experimental results
模型 | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
本文模型 | 51.36 | 26.85 | 47.51 |
50.40 | 26.12 | 46.68 | |
50.75 | 26.16 | 46.76 | |
50.46 | 25.72 | 46.32 |
1 | SOLEYMANI M, GARCIA D, JOU B, et al. A survey of multimodal sentiment analysis [J]. Image and Vision Computing, 2017, 65(9): 3-14. 10.1016/j.imavis.2017.08.003 |
2 | LI H, ZHU J, LIU T, et al. Multi-modal sentence summarization with modality attention and image filtering [C]// Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 4152-4158. 10.24963/ijcai.2018/577 |
3 | LI H, ZHU J, ZHANG J, et al. Multimodal sentence summarization via multimodal selective encoding [C]// Proceedings of the 28th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2020: 5655-5667. 10.18653/v1/2020.coling-main.496 |
4 | MIHALCEA R, TARAU P. TextRank: Bringing order into text [C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2004: 404-411. 10.3115/1220355.1220517 |
5 | RUSH A M, CHOPRA S, WESTON J. A neural attention model for abstractive sentence summarization [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2015: 379-389. 10.18653/v1/d15-1044 |
6 | CHOPRA S, AULI M, RUSH A M. Abstractive sentence summarization with attentive recurrent neural networks [C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2016: 93-98. 10.18653/v1/n16-1012 |
7 | NALLAPATI R, ZHOU B, SANTOS C D, et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond [C]// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg, PA: Association for Computational Linguistics, 2016: 280-290. 10.18653/v1/k16-1028 |
8 | GU J, LU Z, LI H, et al. Incorporating copying mechanism in sequence-to-sequence learning [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2016: 1631-1640. 10.18653/v1/p16-1154 |
9 | SEE A, LIU P J, MANNING C D. Get to the point: Summarization with pointer-generator networks [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2017: 1073-1083. 10.18653/v1/p17-1099 |
10 | ZHU J, LI H, LIU T, et al. MSMO: Multimodal summarization with multimodal output [C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 4154-4164. 10.18653/v1/d18-1448 |
11 | YE X, YUE Z, LIU R. MBA: A multimodal bilinear attention model with residual connection for abstractive multimodal summarization [J]. Journal of Physics: Conference Series, 2021, 1856: 012070. 10.1088/1742-6596/1856/1/012070 |
12 | ZHANG Z, WANG J, SUN Z, et al. LAMS: A location-aware approach for multimodal summarization [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(18): 15949-15950. 10.1609/aaai.v35i18.17971 |
13 | LIU Y, LIU P. SimCLS: A simple framework for contrastive learning of abstractive summarization [C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2021: 1065-1072. 10.18653/v1/2021.acl-short.135 |
14 | PAULUS R, XIONG C, SOCHER R. A deep reinforced model for abstractive summarization [EB/OL]. [2022-10-01]. . |
15 | LI S, LEI D, QIN P, et al. Deep reinforcement learning with distributional semantic rewards for abstractive summarization [C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 6038-6044. 10.18653/v1/d19-1623 |
16 | SHEN W, GONG Y, SHEN Y, et al. Joint generator-ranker learning for natural language generation [EB/OL]. (2022-10-19) [2023-02-06]. . 10.18653/v1/2023.findings-acl.486 |
17 | PAN H, LIN Z, FU P, et al. Modeling intra and inter-modality incongruity for multi-modal sarcasm detection [C]// Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA: Association for Computational Linguistics, 2020: 1383-1392. 10.18653/v1/2020.findings-emnlp.124 |
18 | SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks [J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681. 10.1109/78.650093 |
19 | BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [EB/OL]. [2022-10-01]. . 10.1017/9781108608480.003 |
20 | HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780. 10.1162/neco.1997.9.8.1735 |
21 | LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach [EB/OL]. (2019-06-26) [2023-02-06]. . |
22 | 蔡中祥,孙建伟.融合指针网络的新闻文本摘要模型[J].小型微型计算机系统, 2021, 42(3): 462-466. 10.3969/j.issn.1000-1220.2021.03.003 |
CAI Z X, SUN J W. News text summarization model integrating pointer network [J]. Journal of Chinese Computer Systems, 2021, 42(3): 462-466. 10.3969/j.issn.1000-1220.2021.03.003 | |
23 | LI H, YUAN P, XU S, et al. Aspect-aware multimodal summarization for Chinese e-commerce products [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8188-8195. 10.1609/aaai.v34i05.6332 |
24 | CLARKE J, LAPATA M. Global inference for sentence compression: An integer linear programming approach [J]. Journal of Artificial Intelligence Research, 2008, 31(1): 399-429. 10.1613/jair.2433 |
25 | ZHOU Q, YANG N, WEI F, et al. Selective encoding for abstractive sentence summarization [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2017: 1095-1104. 10.18653/v1/p17-1101 |
[1] | Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN. Siamese mixed information fusion algorithm for RGBT tracking [J]. Journal of Computer Applications, 2024, 44(9): 2878-2885. |
[2] | Rui ZHANG, Pengyun ZHANG, Meirong GAO. Self-optimized dual-modal multi-channel non-deep vestibular schwannoma recognition model [J]. Journal of Computer Applications, 2024, 44(9): 2975-2982. |
[3] | Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831. |
[4] | Yirui HUANG, Junwei LUO, Jingqiang CHEN. Multi-modal dialog reply retrieval based on contrast learning and GIF tag [J]. Journal of Computer Applications, 2024, 44(1): 32-38. |
[5] | Jiaming HE, Jucheng YANG, Chao WU, Xiaoning YAN, Nenghua XU. Person re-identification method based on multi-modal graph convolutional neural network [J]. Journal of Computer Applications, 2023, 43(7): 2182-2189. |
[6] | Meng DOU, Zhebin CHEN, Xin WANG, Jitao ZHOU, Yu YAO. Review of multi-modal medical image segmentation based on deep learning [J]. Journal of Computer Applications, 2023, 43(11): 3385-3395. |
[7] | Na YU, Yan LIU, Xiongju WEI, Yuan WAN. Semantic segmentation of RGB-D indoor scenes based on attention mechanism and pyramid fusion [J]. Journal of Computer Applications, 2022, 42(3): 844-853. |
[8] | Jie MENG, Li WANG, Yanjie YANG, Biao LIAN. Multi-modal deep fusion for false information detection [J]. Journal of Computer Applications, 2022, 42(2): 419-425. |
[9] | DONG Yang, PAN Haiwei, CUI Qianna, BIAN Xiaofei, TENG Teng, WANG Bangju. Few-shot segmentation method for multi-modal magnetic resonance images of brain tumor [J]. Journal of Computer Applications, 2021, 41(4): 1049-1054. |
[10] | WU Rui, LIU Yu, FENG Kai. Pedestrian attribute recognition based on two-domain self-attention mechanism [J]. Journal of Computer Applications, 2021, 41(2): 372-378. |
[11] | Wei CHEN, Yan YANG. Extractive and abstractive summarization model based on pointer-generator network [J]. Journal of Computer Applications, 2021, 41(12): 3527-3533. |
[12] | FU Ying, WANG Hongling, WANG Zhongqing. Scientific paper summarization model using macro discourse structure [J]. Journal of Computer Applications, 2021, 41(10): 2864-2870. |
[13] | TAN Jinyuan, DIAO Yufeng, QI Ruihua, LIN Hongfei. Automatic summary generation of Chinese news text based on BERT-PGN model [J]. Journal of Computer Applications, 2021, 41(1): 127-132. |
[14] | CHEN Hao, QIN Zhiguang, DING Yi. Multi-modal brain tumor segmentation method under same feature space [J]. Journal of Computer Applications, 2020, 40(7): 2104-2109. |
[15] | PI Jiatian, YANG Jiezhi, YANG Linxi, PENG Mingjie, DENG Xiong, ZHAO Lijun, TANG Wanmei, WU Zhiyou. Lightweight face liveness detection method based on multi-modal feature fusion [J]. Journal of Computer Applications, 2020, 40(12): 3658-3665. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||