《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (1): 65-72.DOI: 10.11772/j.issn.1001-9081.2022101527

• 跨媒体表征学习与认知推理 • 上一篇    下一篇

基于语义相关性分析的多模态摘要模型

林于翔1,2, 吴运兵1,2(), 阴爱英3, 廖祥文1,2   

  1. 1.福州大学 计算机与大数据学院, 福州 350108
    2.数字福建金融大数据研究所, 福州 350108
    3.福州大学至诚学院 计算机工程系, 福州 350002
  • 收稿日期:2022-10-14 修回日期:2023-02-08 接受日期:2023-02-14 发布日期:2023-04-12 出版日期:2024-01-10
  • 通讯作者: 吴运兵
  • 作者简介:林于翔(1998—),男,福建平潭人,硕士研究生,主要研究方向:多模态摘要、自然语言处理;
    阴爱英(1976—),女,山西运城人,讲师,硕士,主要研究方向:数据挖掘、文本检索;
    廖祥文(1980—),男,福建泉州人,教授,博士生导师,博士,主要研究方向:观点挖掘、情感分析、自然语言处理。
    第一联系人:吴运兵(1976—),男,福建平潭人,副教授,硕士,主要研究方向:知识表示与知识发现、自然语言处理;
  • 基金资助:
    国家自然科学基金资助项目(61976054);福建省自然科学基金资助项目(2022J01116)

Multi-modal summarization model based on semantic relevance analysis

Yuxiang LIN1,2, Yunbing WU1,2(), Aiying YIN3, Xiangwen LIAO1,2   

  1. 1.College of Computer and Data Science,Fuzhou University,Fuzhou Fujian 350108,China
    2.Digital Fujian Institute of Financial Big Data,Fuzhou Fujian 350108,China
    3.Department of Computer Engineering,Zhicheng College of Fuzhou University,Fuzhou Fujian 350002,China
  • Received:2022-10-14 Revised:2023-02-08 Accepted:2023-02-14 Online:2023-04-12 Published:2024-01-10
  • Contact: Yunbing WU
  • About author:LIN Yuxiang, born in 1998, M. S. candidate. His research interests include multimodal summarization, natural language processing.
    YIN Aiying, born in 1976, M. S., lecturer. Her research interests include data mining, text retrieval.
    LIAO Xiangwen, born in 1980, Ph. D., professor. His research interests include opinion mining, sentiment analysis, natural language processing.
  • Supported by:
    National Natural Science Foundation of China(61976054);Natural Science Foundation of Fujian Province(2022J01116)

摘要:

多模态生成式摘要往往采用序列到序列(Seq2Seq)框架,目标函数在字符级别优化模型,根据局部最优解生成单词,忽略了摘要样本全局语义信息,使得摘要与多模态信息产生语义偏差,容易造成事实性错误。针对上述问题,提出一种基于语义相关性分析的多模态摘要模型。首先,在Seq2Seq框架基础上对多模态摘要进行训练,生成语义多样性的候选摘要;其次,构建基于语义相关性分析的摘要评估器,从全局的角度学习候选摘要之间的语义差异性和真实评价指标ROUGE (Recall-Oriented Understudy for Gisting Evaluation)的排序模式,从而在摘要样本层面优化模型;最后,不依赖参考摘要,利用摘要评估器对候选摘要进行评价,使得选出的摘要与源文本在语义空间中尽可能相似。实验结果表明,在公开数据集MMSS上,相较于MPMSE (Multimodal Pointer-generator via Multimodal Selective Encoding)模型,所提模型在ROUGE-1、ROUGE-2、ROUGE-L评价指标上分别提升了3.17、1.21和2.24个百分点。

关键词: 多模态, 生成式摘要, 序列到序列, 事实性错误, 语义相关性

Abstract:

Multi-modal abstractive summarization is commonly based on the Sequence-to-Sequence (Seq2Seq) framework, and the objective function optimizes the model at the character level, which searches locally optimal results to generate words and ignores the global semantic information of the summary samples. It may cause a problem of semantic deviation between the summary and multimodal information, resulting in factual errors. In order to solve the above problems, a multi-modal summarization model based on semantic relevance analysis was proposed. Firstly, the summary generator based on Seq2Seq framework was trained to generate candidate summaries with semantic multiplicity. Secondly, a summary evaluator based on semantic relevance analysis was applied to learn the semantic differences among candidate summaries and the evaluation mode of ROUGE (Recall-Oriented Understudy for Gisting Evaluation) from a global perspective, so that the model could be optimized at the level of summary samples. Finally, the summary evaluator was used to carry out reference-free evaluation of the candidate summaries, making the finally selected summary sample as similar as possible to the source text in semantic space. Experiments on benchmark dataset MMSS show that the proposed model can improve the evaluation indexes of ROUGE-1, ROUGE-2 and ROUGE-L by 3.17, 1.21 and 2.24 percentage points respectively compared with the current optimal MPMSE (Multimodal Pointer-generator via Multimodal Selective Encoding) model.

Key words: multi-modal, abstractive summarization, Sequence-to-Sequence (Seq2Seq), factual error, semantic relevance

中图分类号: