融合多模态信息的产品摘要抽取模型

doi:10.11772/j.issn.1001-9081.2022121910

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (1): 73-78.DOI: 10.11772/j.issn.1001-9081.2022121910

• 跨媒体表征学习与认知推理 • 上一篇下一篇

融合多模态信息的产品摘要抽取模型

赵强, 王中卿, 王红玲()

苏州大学计算机科学与技术学院，江苏苏州 215006

收稿日期:2023-01-09 修回日期:2023-04-25 接受日期:2023-04-25 发布日期:2024-01-24 出版日期:2024-01-10
通讯作者: 王红玲
作者简介:赵强（1996—），男，安徽宿州人，硕士研究生，主要研究方向：多模态摘要；
王中卿（1987—），男，江苏苏州人，副教授，博士，主要研究方向：情感分析；
第一联系人：王红玲（1975—），女，江苏苏州人，副教授，博士，主要研究方向：文本摘要。
基金资助:
国家自然科学基金资助项目(61976146)

Product summarization extraction model with multimodal information fusion

Qiang ZHAO, Zhongqing WANG, Hongling WANG()

School of Computer Science and Technology，Soochow University，Suzhou Jiangsu 215006，China

Received:2023-01-09 Revised:2023-04-25 Accepted:2023-04-25 Online:2024-01-24 Published:2024-01-10
Contact: Hongling WANG
About author:ZHAO Qiang， born in 1996， M. S. candidate. His research interests include multimodal summarization.
WANG Zhongqing， born in 1987， Ph. D.， associate professor. His research interests include sentiment analysis.
Supported by:
National Natural Science Foundation of China(61976146)

摘要/Abstract

摘要：

在网络购物平台上，简洁、真实、有效的产品摘要对于提升购物体验至关重要。网上购物无法接触到产品实物，产品图像所含信息是除产品文本描述外的重要视觉信息，因此融合包括产品文本和产品图像在内的多模态信息的产品摘要对于网络购物具有重要的意义。针对融合产品文本描述和产品图像的问题，提出一种融合多模态信息的产品摘要抽取模型。与一般的产品摘要任务的输入只包含产品文本描述不同，该模型引入了产品图像作为一种额外的信息来源，使抽取产生的摘要更丰富。具体来说，首先对产品文本描述和产品图像分别使用预训练模型进行特征表示，从产品文本描述中提取每个句子的文本特征表示，从产品图像中提取产品整体的视觉特征表示；然后使用基于低阶张量的多模态融合方法将每个句子的文本特征和整体视觉特征进行模态融合，得到每个句子的多模态特征表示；最后将所有句子的多模态特征表示输入摘要生成器中以生成最终的产品摘要。在CEPSUM （Chinese E-commerce Product SUMmarization） 2.0数据集上进行对比实验，在CEPSUM 2.0的3个数据子集上，该模型的平均ROUGE-1比TextRank高3.12个百分点，比BERTSUMExt （BERT SUMmarization Extractive）高1.75个百分点。实验结果表明，该模型融合产品文本和图像信息对于产品摘要是有效的，在ROUGE评价指标上表现良好。

关键词: 产品摘要, 多模态摘要, 抽取式摘要, 多模态融合, 自动文摘

Abstract:

On online shopping platforms， concise， authentic and effective product summarizations are crucial to improving the shopping experience. In addition， online shopping cannot touch the actual product， and the information contained in the product image is important visual information except the product text description， so product summarization that fuses multimodal information including product text and product image is of great significance for online shopping. Aiming at fusing product text descriptions and product images， a product summarization extraction model with multimodal information fusion was proposed. Different from the general product summarization task whose input only contains the product text description， the proposed model introduces product image as an additional source of information to make the extracted summary richer. Specifically， first the pre-trained model was used to represent the features of the product text description and product image by which the text feature representation of each sentence was extracted from the product text description， and the overall visual feature representation of the product was extracted from the product image. Then the low-rank tensor-based multimodal fusion method was used to modally fuse the text features and overall visual features to obtain the multimodal feature representation for each sentence. Finally， the multimodal feature representations of all sentences were fed into the summary generator to generate the final product summarization. Comparative experiments were conducted on CEPSUM 2.0 （Chinese E-commerce Product SUMmarization 2.0） dataset. On the three subsets of CEPSUM 2.0， the average ROUGE-1 （Recall-Oriented Understudy for Gisting Evaluation 1） of this model is 3.12 percentage points higher than that of TextRank and 1.75 percentage points higher than that of BERTSUMExt （BERT SUMmarization Extractive）. Experimental results show that the proposed model is effective in fusing product text and image information， which performs well on ROUGE evaluation index.

Key words: product summarization, multimodal summarization, extraction summarization, multimodal fusion, automatic summarization

中图分类号:

TP391.1

赵强, 王中卿, 王红玲. 融合多模态信息的产品摘要抽取模型[J]. 计算机应用, 2024, 44(1): 73-78.

Qiang ZHAO, Zhongqing WANG, Hongling WANG. Product summarization extraction model with multimodal information fusion[J]. Journal of Computer Applications, 2024, 44(1): 73-78.

图/表 4

参考文献 26

1	DEVLIN J， CHANG M-W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1（Long and Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2019： 4171-4186. 10.18653/v1/n18-2
2	LIU Y， LAPATA M. Text summarization with pretrained encoders ［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2019： 3730-3740. 10.18653/v1/d19-1387
3	LI H， YUAN P， XU S， et al. Aspect-aware multimodal summarization for Chinese e-commerce products ［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2020， 34（5）： 8188-8195. 10.1609/aaai.v34i05.6332
4	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale ［C/OL］// Proceedings of the 2021 International Conference on Learning Representations. 2021 ［2022-10-01］. .
5	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
6	ZADEH A， CHEN M， PORIA S， et al. Tensor fusion network for multimodal sentiment analysis ［C］// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2017： 1103-1114. 10.18653/v1/d17-1115
7	SEE A， LIU P J， MANNING C D. Get to the point： Summarization with pointer-generator networks ［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2017： 1073-1083. 10.18653/v1/p17-1099
8	张龙凯，王厚峰.文本摘要问题中的句子抽取方法研究［J］.中文信息学报， 2012， 26（2）： 97-101. 10.3969/j.issn.1003-0077.2012.02.018
	ZHANG L K， WANG H F. Research on sentence extraction in text summarization ［J］. Journal of Chinese Information Processing， 2012， 26（2）： 97-101. 10.3969/j.issn.1003-0077.2012.02.018
9	ZHOU Q， WEI F， ZHOU M. At which level should we extract？ An empirical analysis on extractive document summarization ［C］// Proceedings of the 28th International Conference on Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2020： 5617-5628. 10.18653/v1/2020.coling-main.492
10	NALLAPATI R， ZHAI F， ZHOU B. SummaRuNNer： a recurrent neural network based sequence model for extractive summarization of documents ［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017： 3075-3081. 10.1609/aaai.v31i1.10958
11	LEWIS M， LIU Y， GOYAL N， et al. BART： denoising sequence-to-sequence pre-training for natural language generation， translation， and comprehension ［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2020： 7871-7880. 10.18653/v1/2020.acl-main.703
12	ZHONG M， LIU P， WANG D， et al. Searching for effective neural extractive summarization： What works and what’s next ［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2019： 1049-1058. 10.18653/v1/p19-1100
13	SINHA K， JIA R， HUPKES D， et al. Masked language modeling and the distributional hypothesis： Order word matters pre-training for little ［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2021： 2888-2913. 10.18653/v1/2021.emnlp-main.230
14	DAULTANI V， NIO L， Y-J CHUNG. Unsupervised extractive summarization for product description using coverage maximization with attribute concept ［C］// Proceedings of the 2019 IEEE 13th International Conference on Semantic Computing. Piscataway： IEEE， 2019： 114-117. 10.1109/icosc.2019.8665503
15	CHEN Q， LIN J， ZHANG Y， et al. Towards knowledge-based personalized product description generation in e-commerce ［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Stroudsburg， PA： Association for Computational Linguistics， 2019： 3040-3050. 10.1145/3292500.3330725
16	YANG M， QU Q， SHEN Y， et al. Aspect and sentiment aware abstractive review summarization ［C］// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2018： 1110-1120. 10.1145/3269206.3269273
17	XIAO J， MUNRO R. Text summarization of product titles ［C/OL］// Proceedings of the 2019 SIGIR Workshop on eCommerce. Paris， France： CEUR-WS， 2019 ［2022-12-01］. .
18	KHATRI C， SINGH G， PARIKH N. Abstractive and extractive text summarization using document context vector and recurrent neural networks ［EB/OL］// （2018）［2022-10-01］. .
19	LI H， ZHU J， MA C， et al. Multi-modal summarization for asynchronous collection of text， image， audio and video ［C］// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2017： 1092-1102. 10.18653/v1/d17-1114
20	ZHU J， LI H， LIU T， et al. MSMO： Multimodal summarization with multimodal output ［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2018： 4154-4164. 10.18653/v1/d18-1448
21	ZHU J， ZHOU Y， ZHANG J， et al. Multimodal summarization with guidance of multimodal reference ［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2020， 34（5）： 9749-9756. 10.1609/aaai.v34i05.6525
22	LI H， ZHU J， LIU T， et al. Multi-modal sentence summarization with modality attention and image filtering ［C］// Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 4152-4158. 10.24963/ijcai.2018/577
23	LIU Z， SHEN Y. Efficient low-rank multimodal fusion with modality-specific factors ［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics （Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2018： 2247-2256. 10.18653/v1/p18-1209
24	LIN C-Y， HOVY E. Automatic evaluation of summaries using N-gram co-occurrence statistics ［C］// Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg： Association for Computational Linguistics， 2003： 150-157. 10.3115/1073445.1073465
25	MIHALCEA R， TARAU P. TextRank： Bringing order into texts ［C］// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2004： 404-411. 10.3115/1220355.1220517
26	PAGE L， BRIN S， MOTWANI R， et al. The PageRank citation ranking： bringing order to the Web ［C］// Proceedings of the 7th International World Wide Web Conference. Republic and Canton of Geneva： International World Wide Web Conferences Steering Committee， 1998： 161-172.

数据子集	样本数
数据子集	训练集	验证集	测试集
箱包	50 000	10 000	10 000
家用电器	100 000	10 000	10 000
服装	200 000	10 000	10 000

数据子集	样本数
数据子集	训练集	验证集	测试集
箱包	50 000	10 000	10 000
家用电器	100 000	10 000	10 000
服装	200 000	10 000	10 000

模型	箱包			家用电器			服装
模型	R-1	R-2	R-L	R-1	R-2	R-L	R-1	R-2	R-L
LEAD	20.42	6.25	14.09	25.68	8.89	16.69	20.92	6.58	13.95
TextRank	21.84	6.72	14.54	25.34	8.79	17.02	22.47	7.23	14.76
BERTSUMExt	23.69	6.55	14.76	27.11	8.44	16.27	22.98	6.36	14.40
MM-Add^*	23.73	6.54	14.76	27.80	8.97	16.54	23.32	6.77	14.66
MM-Concate^*	23.52	6.45	14.71	28.23	9.27	16.66	23.29	6.71	14.64
MM-Attn^*	23.78	7.00	14.69	28.77	9.67	17.00	23.42	6.97	14.49
MM-LMF^*	25.24	7.43	15.57	29.60	9.94	17.74	24.17	7.31	14.61

模型	箱包			家用电器			服装
模型	R-1	R-2	R-L	R-1	R-2	R-L	R-1	R-2	R-L
LEAD	20.42	6.25	14.09	25.68	8.89	16.69	20.92	6.58	13.95
TextRank	21.84	6.72	14.54	25.34	8.79	17.02	22.47	7.23	14.76
BERTSUMExt	23.69	6.55	14.76	27.11	8.44	16.27	22.98	6.36	14.40
MM-Add^*	23.73	6.54	14.76	27.80	8.97	16.54	23.32	6.77	14.66
MM-Concate^*	23.52	6.45	14.71	28.23	9.27	16.66	23.29	6.71	14.64
MM-Attn^*	23.78	7.00	14.69	28.77	9.67	17.00	23.42	6.97	14.49
MM-LMF^*	25.24	7.43	15.57	29.60	9.94	17.74	24.17	7.31	14.61

[1]	王春雷, 王肖, 刘凯. 多模态知识图谱表示学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 1-15.
[2]	毕以镇, 马焕, 张长青. 增广模态收益动态评估方法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3099-3106.
[3]	吴明晖, 张广洁, 金苍宏. 基于多模态信息融合的时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2326-2332.
[4]	孟杰, 王莉, 杨延杰, 廉飚. 基于多模态深度融合的虚假信息检测[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 419-425.
[5]	杜嘻嘻, 程华, 房一泉. 基于优势演员-评论家算法的强化自动摘要模型[J]. 计算机应用, 2021, 41(3): 699-705.
[6]	卢玲, 杨武, 曹琼. 基于多重映射的自动短文摘方法[J]. 计算机应用, 2016, 36(2): 432-436.
[7]	龙珑邓伟. 绿色网络智能文摘算法研究[J]. 计算机应用, 2012, 32(07): 2030-2032.
[8]	程传鹏杨要科. 自动文摘中的冗余句消除方法[J]. 计算机应用, 2011, 31(12): 3275-3277.
[9]	刘扬郑逢斌姜保庆蔡坤. 基于多模态融合和时空上下文语义的跨媒体检索模型的研究[J]. 计算机应用, 2009, 29(4): 1182-1187.
[10]	傅间莲，陈群秀. 基于连续段落相似度的主题划分算法[J]. 计算机应用, 2005, 25(09): 2022-2024.
[11]	郭庆琳，樊孝忠，柳长安. 文本聚类在自动文摘中的应用研究[J]. 计算机应用, 2005, 25(05): 1036-1038.

融合多模态信息的产品摘要抽取模型

Product summarization extraction model with multimodal information fusion

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 4

参考文献 26

相关文章 11

编辑推荐

Metrics