Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (11): 3540-3546.DOI: 10.11772/j.issn.1001-9081.2024111561

• Artificial intelligence • Previous Articles    

Multimodal knowledge graph link prediction method based on fusing image and textual information

Huilin GUI, Kun YUE, Liang DUAN()   

  1. School of Information Science and Engineering,Yunnan University,Kunming Yunnan 650500,China
  • Received:2024-11-04 Revised:2024-11-11 Accepted:2024-11-22 Online:2024-12-06 Published:2025-11-10
  • Contact: Liang DUAN
  • About author:GUI Huilin, born in 1999, M. S. candidate. Her research interests include link prediction, knowledge engineering.
    YUE Kun, born in 1979, Ph. D., professor. His research interests include big data, knowledge engineering, Bayesian deep learning.
  • Supported by:
    Major Science and Technology Project of Yunnan Province(202202AD080001);General Program of Yunnan Fundamental Research Plan(202301AT070193);Youth Talent Program of “Xingdian Talents Support Plan” of Yunnan Province(C6213001195)

融合图像与文本信息的多模态知识图谱链接预测方法

贵慧琳, 岳昆, 段亮()   

  1. 云南大学 信息学院,昆明 650500
  • 通讯作者: 段亮
  • 作者简介:贵慧琳(1999—),女,湖南常德人,硕士研究生,主要研究方向:链接预测、知识工程
    岳昆(1979—),男,云南曲靖人,教授,博士,CCF高级会员,主要研究方向:大数据、知识工程、贝叶斯深度学习
  • 基金资助:
    云南省重大科技专项(202202AD080001);云南省基础研究计划面上项目(202301AT070193);云南省“兴滇英才支持计划”青年人才项目(C6213001195)

Abstract:

The introduction of multimodal information to enhance knowledge graph link prediction has become a recent hotspot. However, most existing methods typically rely on simple concatenation or attention mechanisms for multimodal feature fusion, ignoring the correlation and semantic inconsistency between different modalities, which may fail to preserve modality-specific information and inadequately exploit the complementary information between modalities. To address these issues, a multimodal knowledge graph link prediction model based on cross-modal attention mechanism and contrastive learning was proposed, namely FITILP(Fusing Image and Textual Information for Link Prediction). Firstly, pretrained models, such as BERT (Bidirectional Encoder Representation of Transformer) and ResNet (Residual Network), were used to extract textual and visual features of entities. Then, a Contrastive Learning (CL) approach was applied to reduce semantic inconsistencies across modalities. A cross-modal attention module was designed to refine text feature attention parameters using image features, thereby enhancing the cross-modal correlations between text and image features. And Translation models, such as TransE (Translating Embeddings) and TransH (Translation on Hyperplanes), were employed to generate graph structural, visual, and textual features. Finally, the three types of features were fused to perform link prediction between entities. Experimental results on the DB15K dataset show that the FITILP model improves Mean Reciprocal Rank (MRR) by 6.6 percentage points compared to single-modal baseline TransE, and achieves improvements of 3.95, 11.37, and 14.01 percentage points in Hits@1, Hits@10 and Hits@100, respectively. The results indicate that the proposed method outperforms comparative baseline methods, demonstrating its effectiveness in leveraging multimodal information to enhance prediction performance.

Key words: Multimodal Knowledge Graph (MKG), Link Prediction (LP), Contrastive Learning (CL), multimodal feature fusion

摘要:

引入多模态信息提升知识图谱链接预测的性能成为最近的研究热点,然而这些方法通常只采用简单的拼接或注意力机制进行多模态特征融合,忽视了不同模态间的关联性和语义不一致性,难以保留各模态中的特定信息,且不能有效利用各模态间的信息互补性。针对上述问题,提出一个基于跨模态注意力机制及对比学习的多模态知识图谱链接预测模型FITILP(Fusing Image and Textual Information for Link Prediction)。首先,基于预训练模型BERT(Bidirectional Encoder Representation of Transformer)和ResNet(Residual Network)分别提取实体的文本和图像特征;其次,利用对比学习(CL)方法减小不同模态间的语义不一致性,设计跨模态注意力模块,通过图像特征优化文本特征的注意力参数,增强文本与图像间的跨模态关联性,并结合TransE(Translating Embeddings)和TransH(Translation on Hyperplanes)等翻译模型生成图结构、图像和文本特征;最后,整合上述3类特征完成实体间的链接预测。在DB15K数据集上的实验结果表明,与对应的单模态方法TransE相比,FITILP模型的平均排名倒数(MRR)提升了6.6个百分点,Hits@1、Hits@10、Hits@100分别提升了3.95、11.37、14.01个百分点。所提方法在链接预测任务上的表现优于对比的基线方法,能够有效利用多模态信息提升链接预测的性能。

关键词: 多模态知识图谱, 链接预测, 对比学习, 多模态特征融合

CLC Number: