Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (1): 1-15.DOI: 10.11772/j.issn.1001-9081.2023050583

• Cross-media representation learning and cognitive reasoning •     Next Articles

Multimodal knowledge graph representation learning: a review

Chunlei WANG1,2, Xiao WANG1(), Kai LIU3   

  1. 1.Institute of Artificial Intelligence,Shanghai University,Shanghai 200444,China
    2.Shanghai Artificial Intelligence Laboratory,Shanghai 200232,China
    3.School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China
  • Received:2023-05-15 Revised:2023-06-23 Accepted:2023-06-28 Online:2023-08-01 Published:2024-01-10
  • Contact: Xiao WANG
  • About author:WANG Chunlei, born in 1977, Ph. D., research fellow. His research interests include knowledge graph and cognitive intelligence, affective computing and emotion recognition.
    LIU Kai, born in 1996, M. S. candidate. His research interests include knowledge graph.
  • Supported by:
    National Science Fund for Distinguished Young Scholars(62225308)

多模态知识图谱表示学习综述

王春雷1,2, 王肖1(), 刘凯3   

  1. 1.上海大学 人工智能研究院, 上海 200444
    2.上海人工智能实验室, 上海 200232
    3.上海大学 计算机工程与科学学院, 上海 200444
  • 通讯作者: 王肖
  • 作者简介:王春雷(1977—),男,江苏盐城人,研究员,博士,CCF会员,主要研究方向:知识图谱与认知智能、情感计算与情绪识别;
    刘凯(1996—),男,江西抚州人,硕士研究生,主要研究方向:知识图谱。
    第一联系人:王肖(1998—),男,安徽亳州人,硕士研究生,主要研究方向:多模态知识图谱;
  • 基金资助:
    国家杰出青年科学基金资助项目(62225308)

Abstract:

By comprehensively comparing the models of traditional knowledge graph representation learning, including the advantages and disadvantages and the applicable tasks, the analysis shows that the traditional single-modal knowledge graph cannot represent knowledge well. Therefore, how to use multimodal data such as text, image, video, and audio for knowledge graph representation learning has become an important research direction. At the same time, the commonly used multimodal knowledge graph datasets were analyzed in detail to provide data support for relevant researchers. On this basis, the knowledge graph representation learning models under multimodal fusion of text, image, video, and audio were further discussed, and various models were summarized and compared. Finally, the effect of multimodal knowledge graph representation on enhancing classical applications, including knowledge graph completion, question answering system, multimodal generation and recommendation system in practical applications was summarized, and the future research work was prospected.

Key words: multimodal knowledge graph, representation learning, multimodal fusion, knowledge graph completion, multimodal generation

摘要:

在综合对比传统知识图谱表示学习模型优缺点以及适用任务后,发现传统的单一模态知识图谱无法很好地表示知识。因此,如何利用文本、图片、视频、音频等多模态数据进行知识图谱表示学习成为一个重要的研究方向。同时,详细分析了常用的多模态知识图谱数据集,为相关研究人员提供数据支持。在此基础上,进一步讨论了文本、图片、视频、音频等多模态融合下的知识图谱表示学习模型,并对其中各种模型进行了总结和比较。最后,总结了多模态知识图谱表示学习如何改善经典应用,包括知识图谱补全、问答系统、多模态生成和推荐系统在实际应用中的效果,并对未来的研究工作进行了展望。

关键词: 多模态知识图谱, 表示学习, 多模态融合, 知识图谱补全, 多模态生成

CLC Number: