Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (1): 1-15.DOI: 10.11772/j.issn.1001-9081.2023050583
• Cross-media representation learning and cognitive reasoning • Next Articles
Chunlei WANG1,2, Xiao WANG1(), Kai LIU3
Received:
2023-05-15
Revised:
2023-06-23
Accepted:
2023-06-28
Online:
2023-08-01
Published:
2024-01-10
Contact:
Xiao WANG
About author:
WANG Chunlei, born in 1977, Ph. D., research fellow. His research interests include knowledge graph and cognitive intelligence, affective computing and emotion recognition.Supported by:
通讯作者:
王肖
作者简介:
王春雷(1977—),男,江苏盐城人,研究员,博士,CCF会员,主要研究方向:知识图谱与认知智能、情感计算与情绪识别;基金资助:
CLC Number:
Chunlei WANG, Xiao WANG, Kai LIU. Multimodal knowledge graph representation learning: a review[J]. Journal of Computer Applications, 2024, 44(1): 1-15.
王春雷, 王肖, 刘凯. 多模态知识图谱表示学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 1-15.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023050583
模型基础 | 具体模型 | 适用任务 | 优点 | 缺点 |
---|---|---|---|---|
基于翻译 | TransE | 链接预测 | 简单直观,计算复杂度不高 | 不适用于复杂关系建模 |
TransH | 链接预测、三元组分类、事实提取 | 克服了TransE不适用于复杂关系建模的缺陷,让实体在多关系的情况下仍能获得合适的表示 | 实体关系在同一个空间,对某些场景不适用 | |
TransR | 链接预测、三元组分类、关系事实提取 | 对表示空间进行了改进,考虑了实体和关系投影在不同空间中的情况 | 复杂度较高,参数急剧增加,并且投影矩阵仅和关系有关 | |
TransD | 三元组分类、链接预测 | 将投影矩阵分解成两个,解决了在同一关系下头尾实体投影矩阵相同的问题 | 模型训练过程复杂,需要较多的计算资源 | |
因子分解 | RESCAL | 链接预测、三元组分类 | 能够通过模型的潜在组件执行集体学习,并提供计算因子分解的有效算法 | 模型参数较多,复杂性较高,计算需要很多的资源,并行性不高 |
TuckER | 链接预测 | 模型表达能力强大,可以将正例与反例完全区分开;模型是线性的,相对简单 | 不适用于一般预测任务,模型方程和优化算法针对单个任务单独导出,适用性不强 | |
CNN | ConvE | 链接预测、实习关系预测 | 表现力强、参数效率高 | 不能捕获输入实体和关系的交互,仅在输入实体和关系的邻接矩阵中建模交互 |
ConvKB | 链接预测 | 对ConvE进行优化,将ConvE中的大小调整操作更改为串联,保留了转移特征 | 仅独立考虑三元组,无法覆盖三元组周围的局部邻居中固有的复杂和隐藏信息 | |
ReInceptionE | 链接预测 | 考虑ConvE交互次数受限,提出Inception增强交互;基于KBGAT的缺点,提出充分利用局部和全局结构信息的嵌入模型 | 超参数太多,对于不同的数据集需要重新调参 | |
RNN | RSN | 实体对齐、知识图谱补全 | 通过跳过机制弥合实体之间差距,增加残差学习以有效地捕获KG内部和KG之间的长期关系依赖 | 需要大量的计算资源和高质量的数据训练,并且在实际应用中,模型的缺点会比它的优点更为突出 |
DRNN | 上下文感知推荐 | 基于一阶和子图感知邻近的原理,提出了一种上下文感知知识图嵌入,提高了准确性和可扩展性 | 虽然减少了大量参数,但是也带来了信息损失,在较长的序列中会造成梯度逐渐消失或者爆炸 | |
GNN | R-GCN | 链接预测、实体分类 | 增加了聚合关系的维度,使节点的聚合操作变成一个双重聚合的过程,从而增加表征知识的能力 | 对于学习参数和学习关系没有轻重之分,忽略了不同关系之间的区别 |
GNN+attention | RAGAT | 知识图谱补全 | 优化不同关系之间的区别,增加注意力机制,充分利用知识图的异构特性 | 参数量多,训练方式没有利用高阶邻居,容易发生过度平滑 |
Transformer | CoKE | 链接预测、路径查询 | 允许每个实体或关系使用单个静态表示,使用Transformer编码器获得上下文化表示;学习动态适应每个输入序列的KG嵌入,捕捉实体和其中关系的上下文含义 | 模型太大太深导致学习参数过多,需要更多的计算资源 |
Transformer | HittER | 链接预测 | 底部块提取源实体的本地邻域中每个实体关系对的特征;顶部块从底部块的输出中聚合关系信息,更好地提取丰富的语义知识 | 可解释性差,引入丰富的上下文信息由于其中包含虚假信息,会降低对原实体信息的表示效果,并且可能导致过拟合 |
BERT | KG-BERT | 三元组分类 | 通过增加上下文信息的嵌入来捕获丰富的语义知识 | 模型复杂性变高,虽然模型效果提高,但是可解释性大幅下降 |
Tab. 1 Summary of knowledge graph representation learning models
模型基础 | 具体模型 | 适用任务 | 优点 | 缺点 |
---|---|---|---|---|
基于翻译 | TransE | 链接预测 | 简单直观,计算复杂度不高 | 不适用于复杂关系建模 |
TransH | 链接预测、三元组分类、事实提取 | 克服了TransE不适用于复杂关系建模的缺陷,让实体在多关系的情况下仍能获得合适的表示 | 实体关系在同一个空间,对某些场景不适用 | |
TransR | 链接预测、三元组分类、关系事实提取 | 对表示空间进行了改进,考虑了实体和关系投影在不同空间中的情况 | 复杂度较高,参数急剧增加,并且投影矩阵仅和关系有关 | |
TransD | 三元组分类、链接预测 | 将投影矩阵分解成两个,解决了在同一关系下头尾实体投影矩阵相同的问题 | 模型训练过程复杂,需要较多的计算资源 | |
因子分解 | RESCAL | 链接预测、三元组分类 | 能够通过模型的潜在组件执行集体学习,并提供计算因子分解的有效算法 | 模型参数较多,复杂性较高,计算需要很多的资源,并行性不高 |
TuckER | 链接预测 | 模型表达能力强大,可以将正例与反例完全区分开;模型是线性的,相对简单 | 不适用于一般预测任务,模型方程和优化算法针对单个任务单独导出,适用性不强 | |
CNN | ConvE | 链接预测、实习关系预测 | 表现力强、参数效率高 | 不能捕获输入实体和关系的交互,仅在输入实体和关系的邻接矩阵中建模交互 |
ConvKB | 链接预测 | 对ConvE进行优化,将ConvE中的大小调整操作更改为串联,保留了转移特征 | 仅独立考虑三元组,无法覆盖三元组周围的局部邻居中固有的复杂和隐藏信息 | |
ReInceptionE | 链接预测 | 考虑ConvE交互次数受限,提出Inception增强交互;基于KBGAT的缺点,提出充分利用局部和全局结构信息的嵌入模型 | 超参数太多,对于不同的数据集需要重新调参 | |
RNN | RSN | 实体对齐、知识图谱补全 | 通过跳过机制弥合实体之间差距,增加残差学习以有效地捕获KG内部和KG之间的长期关系依赖 | 需要大量的计算资源和高质量的数据训练,并且在实际应用中,模型的缺点会比它的优点更为突出 |
DRNN | 上下文感知推荐 | 基于一阶和子图感知邻近的原理,提出了一种上下文感知知识图嵌入,提高了准确性和可扩展性 | 虽然减少了大量参数,但是也带来了信息损失,在较长的序列中会造成梯度逐渐消失或者爆炸 | |
GNN | R-GCN | 链接预测、实体分类 | 增加了聚合关系的维度,使节点的聚合操作变成一个双重聚合的过程,从而增加表征知识的能力 | 对于学习参数和学习关系没有轻重之分,忽略了不同关系之间的区别 |
GNN+attention | RAGAT | 知识图谱补全 | 优化不同关系之间的区别,增加注意力机制,充分利用知识图的异构特性 | 参数量多,训练方式没有利用高阶邻居,容易发生过度平滑 |
Transformer | CoKE | 链接预测、路径查询 | 允许每个实体或关系使用单个静态表示,使用Transformer编码器获得上下文化表示;学习动态适应每个输入序列的KG嵌入,捕捉实体和其中关系的上下文含义 | 模型太大太深导致学习参数过多,需要更多的计算资源 |
Transformer | HittER | 链接预测 | 底部块提取源实体的本地邻域中每个实体关系对的特征;顶部块从底部块的输出中聚合关系信息,更好地提取丰富的语义知识 | 可解释性差,引入丰富的上下文信息由于其中包含虚假信息,会降低对原实体信息的表示效果,并且可能导致过拟合 |
BERT | KG-BERT | 三元组分类 | 通过增加上下文信息的嵌入来捕获丰富的语义知识 | 模型复杂性变高,虽然模型效果提高,但是可解释性大幅下降 |
模型 | 基础数据集 | 适用任务 | 效果说明 | 改进分析 |
---|---|---|---|---|
pTransE | Freebase[ NY Times | 三元组分类、改进关系抽取、类比推理任务 | 与TransE与word2vec(Skip Gram)相当或稍好,主要为了解决推理新关系事实 | 基于TransE进行相应改变,实体和单词联合嵌入并对同一空间中实体和单词进行对齐以达到更好的嵌入效果 |
CONV | FB15K-237[ | 文本推理 | 对具有文本提及的实体对有更大的改进,提高了链接预测性能 | 在Riedel等[ |
TEKE | FB13[ | 链接预测、三元组分类 | 链接预测任务中的Hits@10指标明显且一致地优于其他基线,TEKE_H比TEKE_R表现略好。三元组分类任中,TEKE_E和TEKE_H始终优于其他基线,其中上述三个模型是基于TransE、TransR、TransH实现的不同TEKE模型 | 基于TransE、TransH和TransR但在不同的优化目标上实现,该模型构建了一个基于实体注释文本语料库的共现网络用于将知识和文本信息连接在一起 |
DKRL | FB15K[ | 知识图谱补全、实体类型分类 | 在现有的基于翻译的模型中zero-shot场景下达到最好效果;在实体分类任务中zero-shot场景下也有很大优势 | 基于TransE进行改变,实体的嵌入既对相应的事实三元组进行建模,也对其描述进行建模,同时利用事实三元组和实体描述 |
TKRL | FB15K[ | 知识图谱补全、三元组分类 | 在知识图谱补全任务和三元组分类中比TransE和TransR效果更好,增强相同类型实体之间的差异对三元组分类任务性能有明显提高 | 基于DKRL增加多层实体类型进行模型的改进,模型设计了两个类型编码器来建模分层结构 |
SSP | FB15K[ | 知识图谱补全、实体分类 | 在知识图谱补全任务中优于其他基线,在实体分类任务中达到最优;与TransE和DKRL相比有着更高的精度 | 描述之间交互过程中增加两个因子用于平衡嵌入向量中文本描述和三元组信息之间的权重,同时进行主题模型和嵌入模型以此来共同学习语义和嵌入 |
AATE、ATE | WN11[ | 链接预测、三元组分类 | 链接预测任务中AATE和ATE完全优于所有基线。AATE比ATE取得更好的结果。三元组分类任务中AATE同样优于所有基线,AATE比ATE更加提高了所有数据集的准确性 | 基于BiLSTM对关系和实体进行编码,提出相互关注机制学习关系和实体的更准确文本表示,模型由嵌入层、BiLSTM层和相互关注层组成 |
KDCoE | WK3160K(基于DBpedia[ | 跨语言实体对齐、跨语言知识图谱补全 | 跨语言实体对齐中KDCoE的最后阶段超过了所有基线,在跨语言知识图谱补全中KDCoE-mono的表现至少与TransE相当,这表明KDCoE很好地保留了单语KG结构的特征 | 对多语种KG嵌入模型(Knowledge Graph Embedding Model, KGEM)和多语种字面描述嵌入模型(Descripton Embedding Model, DEM)进行联合训练,KGEM采用TransE,DEM使用门控递归单元编码器(Attentive Gate Recurrent Unit, AGRU)来编码多语言实体描述 |
KG-BERT | WN11[ | 三元组分类、链接预测,关系预测 | 三个任务中KG-BERT的效果基本优于所有基线,但链路预测任务非常耗时,在该任务中几乎所有实体都要替换头部或尾部实体 | 模型初始化选BERT Base在此基础上再进行微调,从而对三元组进行建模表示 |
Tab. 2 Summary of knowledge graph learning models with text information
模型 | 基础数据集 | 适用任务 | 效果说明 | 改进分析 |
---|---|---|---|---|
pTransE | Freebase[ NY Times | 三元组分类、改进关系抽取、类比推理任务 | 与TransE与word2vec(Skip Gram)相当或稍好,主要为了解决推理新关系事实 | 基于TransE进行相应改变,实体和单词联合嵌入并对同一空间中实体和单词进行对齐以达到更好的嵌入效果 |
CONV | FB15K-237[ | 文本推理 | 对具有文本提及的实体对有更大的改进,提高了链接预测性能 | 在Riedel等[ |
TEKE | FB13[ | 链接预测、三元组分类 | 链接预测任务中的Hits@10指标明显且一致地优于其他基线,TEKE_H比TEKE_R表现略好。三元组分类任中,TEKE_E和TEKE_H始终优于其他基线,其中上述三个模型是基于TransE、TransR、TransH实现的不同TEKE模型 | 基于TransE、TransH和TransR但在不同的优化目标上实现,该模型构建了一个基于实体注释文本语料库的共现网络用于将知识和文本信息连接在一起 |
DKRL | FB15K[ | 知识图谱补全、实体类型分类 | 在现有的基于翻译的模型中zero-shot场景下达到最好效果;在实体分类任务中zero-shot场景下也有很大优势 | 基于TransE进行改变,实体的嵌入既对相应的事实三元组进行建模,也对其描述进行建模,同时利用事实三元组和实体描述 |
TKRL | FB15K[ | 知识图谱补全、三元组分类 | 在知识图谱补全任务和三元组分类中比TransE和TransR效果更好,增强相同类型实体之间的差异对三元组分类任务性能有明显提高 | 基于DKRL增加多层实体类型进行模型的改进,模型设计了两个类型编码器来建模分层结构 |
SSP | FB15K[ | 知识图谱补全、实体分类 | 在知识图谱补全任务中优于其他基线,在实体分类任务中达到最优;与TransE和DKRL相比有着更高的精度 | 描述之间交互过程中增加两个因子用于平衡嵌入向量中文本描述和三元组信息之间的权重,同时进行主题模型和嵌入模型以此来共同学习语义和嵌入 |
AATE、ATE | WN11[ | 链接预测、三元组分类 | 链接预测任务中AATE和ATE完全优于所有基线。AATE比ATE取得更好的结果。三元组分类任务中AATE同样优于所有基线,AATE比ATE更加提高了所有数据集的准确性 | 基于BiLSTM对关系和实体进行编码,提出相互关注机制学习关系和实体的更准确文本表示,模型由嵌入层、BiLSTM层和相互关注层组成 |
KDCoE | WK3160K(基于DBpedia[ | 跨语言实体对齐、跨语言知识图谱补全 | 跨语言实体对齐中KDCoE的最后阶段超过了所有基线,在跨语言知识图谱补全中KDCoE-mono的表现至少与TransE相当,这表明KDCoE很好地保留了单语KG结构的特征 | 对多语种KG嵌入模型(Knowledge Graph Embedding Model, KGEM)和多语种字面描述嵌入模型(Descripton Embedding Model, DEM)进行联合训练,KGEM采用TransE,DEM使用门控递归单元编码器(Attentive Gate Recurrent Unit, AGRU)来编码多语言实体描述 |
KG-BERT | WN11[ | 三元组分类、链接预测,关系预测 | 三个任务中KG-BERT的效果基本优于所有基线,但链路预测任务非常耗时,在该任务中几乎所有实体都要替换头部或尾部实体 | 模型初始化选BERT Base在此基础上再进行微调,从而对三元组进行建模表示 |
模型 | 基础数据集 | 适用任务 | 效果说明 | 改进分析 |
---|---|---|---|---|
IKRL | WN9-IMG(基于WN18[ | 知识图谱补全、三元组分类 | 在知识图谱补全任务和三元组分类中整体质量上显著优于基线,实验结果表明图像信息可以提供补充信息,并且基于注意力机制可以联合考虑多个实例 | 基于翻译模型的整体架构,该架构结合了结构化知识信息和视觉信息,并利用神经图像编码器和实例级注意力的方式来联合学习图像和结构的表示 |
MTKGRL | WN9-IMG[ | 链接预测、三元组分类 | 在链接预测任务中优于IKRL模型。在三元组分类任务中模型更有效利用了多模态信息,与TransE相比平均精度提高超过1个百分点 | 基于翻译的表示学习模型,通过融合视觉和语言信息,扩展了三元组的定义从而建立新的多模态表示 |
NTL改进模型 | WN1M(基于WordNet[ | 类别属性预测 | 在三个数据集上都明显优于基线INTL模型。在开放世界(OW)案例中,其性能仅比零样本(ZS)数据集稍差;在1K数据集上的性能较低 | 训练了两种知识图谱嵌入函数,一种基于原始NTL架构,另一种基于改进的平滑SNTL模型。对于图像嵌入,使用VGG16架构,训练了两个嵌入模型,分别以NTL和SNTL的实体向量为目标 |
TransAE | WN9-IMG-TXT (基于WN9-IMG[ | 链接预测、三元组分类 | 链接预测任务中TransAE模型优于所有基线模型,与IKRL和DKRL相比有一定的优势。三元组分类任务中TransAE效果最佳,大多数情况下,准确性足够高,可以将知识图中的正三元组与负三元组区分开来 | 将Multimodal autoencoder和TransE结合,同时学习多模态知识和结构知识。提取视觉和文本特征向量,将这些向量输入多模态自动编码器,得到联合嵌入作为实体表示 |
RSME | WN18-IMG(基于WN18[ | 链接预测 | RSME的性能优于所有其他模型。RSME(VIT)和RSME(No Img)之间的差异比较显著,表明视觉上下文的引入确实有帮助。RSME(VIT)和RSME(VIT+Forget)的对比也说明forget gate在大多数情况下确实有进一步的提升 | 由一个基本的KG嵌入模型和三个门(过滤门、遗忘门和融合门)组成。使用过滤门来自动过滤不相关的图像,图像通过遗忘门来增强有益的特征。在遗忘门之后,视觉信息和KG结构信息在融合门中融合,最后通过最小化损失函数获得实体和关系的嵌入 |
EVA | DBP15K[ | 实体对齐 | 半监督EVA在两个EA基准测试中获得了新的SOTA,大幅超过了以前的模型。无监督EVA达到了大于70.0%的准确率 | 利用视觉相似性创造初始种子字典,提供了一个完全无监督的解决方案。通过多模态嵌入学习过程和对齐学习过程联合以解决实体对齐任务 |
HRGAT | FB15K-237[ | 多模态知识图谱补全 | HRGAT与其他基线相比,多数评测指标都为最高。与四个基础模型比较,优于所有传统知识图谱嵌入模型。HRGAT对于多模态知识图的补全任务达到了高质量水平 | HRGAT主要包括信息融合模块(通过预训练嵌入和低秩多模态融合融合多模态特征),信息聚合模块(捕获多模态知识图谱中的结构信息),预测块(用于多模态知识图谱补全) |
MMKRL | WN9-IMG[ | 链接预测、三元组分类 | 链接预测中MMKRL与文中多模态KRL模型相比较,除Raw指标,其他所有指标均为最高。三元组分类中MMKRL明显优于所有模型,在FB-IMG数据集上效果最好 | MMKRL主要分成两个模块,其中知识重构模块中使用不同预训练编码模型对各种知识进行嵌入以此重构多模态知识图谱,而AT模块中使用联合学习框架学习结构化和多模态表示 |
Tab. 3 Summary of knowledge graph learning models with image information
模型 | 基础数据集 | 适用任务 | 效果说明 | 改进分析 |
---|---|---|---|---|
IKRL | WN9-IMG(基于WN18[ | 知识图谱补全、三元组分类 | 在知识图谱补全任务和三元组分类中整体质量上显著优于基线,实验结果表明图像信息可以提供补充信息,并且基于注意力机制可以联合考虑多个实例 | 基于翻译模型的整体架构,该架构结合了结构化知识信息和视觉信息,并利用神经图像编码器和实例级注意力的方式来联合学习图像和结构的表示 |
MTKGRL | WN9-IMG[ | 链接预测、三元组分类 | 在链接预测任务中优于IKRL模型。在三元组分类任务中模型更有效利用了多模态信息,与TransE相比平均精度提高超过1个百分点 | 基于翻译的表示学习模型,通过融合视觉和语言信息,扩展了三元组的定义从而建立新的多模态表示 |
NTL改进模型 | WN1M(基于WordNet[ | 类别属性预测 | 在三个数据集上都明显优于基线INTL模型。在开放世界(OW)案例中,其性能仅比零样本(ZS)数据集稍差;在1K数据集上的性能较低 | 训练了两种知识图谱嵌入函数,一种基于原始NTL架构,另一种基于改进的平滑SNTL模型。对于图像嵌入,使用VGG16架构,训练了两个嵌入模型,分别以NTL和SNTL的实体向量为目标 |
TransAE | WN9-IMG-TXT (基于WN9-IMG[ | 链接预测、三元组分类 | 链接预测任务中TransAE模型优于所有基线模型,与IKRL和DKRL相比有一定的优势。三元组分类任务中TransAE效果最佳,大多数情况下,准确性足够高,可以将知识图中的正三元组与负三元组区分开来 | 将Multimodal autoencoder和TransE结合,同时学习多模态知识和结构知识。提取视觉和文本特征向量,将这些向量输入多模态自动编码器,得到联合嵌入作为实体表示 |
RSME | WN18-IMG(基于WN18[ | 链接预测 | RSME的性能优于所有其他模型。RSME(VIT)和RSME(No Img)之间的差异比较显著,表明视觉上下文的引入确实有帮助。RSME(VIT)和RSME(VIT+Forget)的对比也说明forget gate在大多数情况下确实有进一步的提升 | 由一个基本的KG嵌入模型和三个门(过滤门、遗忘门和融合门)组成。使用过滤门来自动过滤不相关的图像,图像通过遗忘门来增强有益的特征。在遗忘门之后,视觉信息和KG结构信息在融合门中融合,最后通过最小化损失函数获得实体和关系的嵌入 |
EVA | DBP15K[ | 实体对齐 | 半监督EVA在两个EA基准测试中获得了新的SOTA,大幅超过了以前的模型。无监督EVA达到了大于70.0%的准确率 | 利用视觉相似性创造初始种子字典,提供了一个完全无监督的解决方案。通过多模态嵌入学习过程和对齐学习过程联合以解决实体对齐任务 |
HRGAT | FB15K-237[ | 多模态知识图谱补全 | HRGAT与其他基线相比,多数评测指标都为最高。与四个基础模型比较,优于所有传统知识图谱嵌入模型。HRGAT对于多模态知识图的补全任务达到了高质量水平 | HRGAT主要包括信息融合模块(通过预训练嵌入和低秩多模态融合融合多模态特征),信息聚合模块(捕获多模态知识图谱中的结构信息),预测块(用于多模态知识图谱补全) |
MMKRL | WN9-IMG[ | 链接预测、三元组分类 | 链接预测中MMKRL与文中多模态KRL模型相比较,除Raw指标,其他所有指标均为最高。三元组分类中MMKRL明显优于所有模型,在FB-IMG数据集上效果最好 | MMKRL主要分成两个模块,其中知识重构模块中使用不同预训练编码模型对各种知识进行嵌入以此重构多模态知识图谱,而AT模块中使用联合学习框架学习结构化和多模态表示 |
模型 | 基础数据集 | 适用任务 | 效果说明 | 改进分析 |
---|---|---|---|---|
TransFusion | FB15K[ | 视频标签推理 | TransFusion变体都优于基本模型TransE,包括没有任何预训练视频嵌入的TransFusion-0。集成任何额外组合(单一、双重或三重)模态的TransFusion的性能明显优于TransFusion-0和所有基线 | 是部分可训练的模型,通过融合KG嵌入和多种模态的预训练视频嵌入。结合预定义的评分函数,融合的视频嵌入用于导出语义关系的嵌入,这些嵌入进一步用于推断标签,作为常规链接预测任务 |
基于CLIP的模型[ | CN-DBpedia等自制数据集 | 视频关系标签(VRT)、视频关系视频(VRV)任务 | 该模型在HITS@10上分别在VRV和VRT任务中获得了303.4%和30.2%的大幅提升,优于所有基于两阶段KGE的模型 | 模型首先对视频编码器进行视频理解训练,通过基于CLIP的模型将视频嵌入投影到相同的标签嵌入空间中,最后在一个模型中联合优化KGE、CLIP和视频理解目标 |
Tab. 4 Summary of knowledge graph learning models with audio and video information
模型 | 基础数据集 | 适用任务 | 效果说明 | 改进分析 |
---|---|---|---|---|
TransFusion | FB15K[ | 视频标签推理 | TransFusion变体都优于基本模型TransE,包括没有任何预训练视频嵌入的TransFusion-0。集成任何额外组合(单一、双重或三重)模态的TransFusion的性能明显优于TransFusion-0和所有基线 | 是部分可训练的模型,通过融合KG嵌入和多种模态的预训练视频嵌入。结合预定义的评分函数,融合的视频嵌入用于导出语义关系的嵌入,这些嵌入进一步用于推断标签,作为常规链接预测任务 |
基于CLIP的模型[ | CN-DBpedia等自制数据集 | 视频关系标签(VRT)、视频关系视频(VRV)任务 | 该模型在HITS@10上分别在VRV和VRT任务中获得了303.4%和30.2%的大幅提升,优于所有基于两阶段KGE的模型 | 模型首先对视频编码器进行视频理解训练,通过基于CLIP的模型将视频嵌入投影到相同的标签嵌入空间中,最后在一个模型中联合优化KGE、CLIP和视频理解目标 |
模型 | 基础数据集 | 适用任务 | 效果说明 | 改进分析 |
---|---|---|---|---|
PoE | FB15K[ | 链接预测 | 通过融合了三个多模态知识图谱验证了不同模态对于sameAs链接预测任务是互补的假设。PoE-lrni的后缀代表嵌入了l(潜在)、r(关系)、n(数字)、i(图像)。PoE-lrni在链接预测任务中优于其他基线模型。其中PoE-lrni和PoE-rni效果最佳,并且嵌入专家响应占主导地位 | PoE模型通过合并视觉信息进行扩展,在KG中,目标是学习一个PoE,将高概率分配给真实的三元组,并将低概率分配给假定为假的三元组。对于每个关系类型都对于一个expert |
MKBE | YAGO-10[ | 链接预测、评级预测 | 与链接预测模型DistMult和ConvE相比,MKBE在链接预测任务中准确率更高 | MKBE用神经网络编码器和解码器来替换任何基于嵌入的关系模型的初始层,应用于DistMult和ConvE。适合数据类型(例如文本、图像、数值和分类值)的知识图谱建模 |
MMEA | FB15K-DB15K[ FB15K-YAG15K[ | 多模态知识图谱实体对齐 | 在实体对齐任务中对比TransE、MTransE、IPTransE、SEA、GCN、IMUSE等,在两个数据集上MMEA是性能最好的模型,并且MMEA更能充分利用有限的数据 | MMEA包括多模态知识嵌入和多模态知识融合两个模块。在第一个模块中提取关系、视觉和数字信息用于补充实体特征,在第二个模块中将多模态知识融合,并使用交互式训练 |
Tab. 5 Summary of knowledge graph represent learning models with multimodal information
模型 | 基础数据集 | 适用任务 | 效果说明 | 改进分析 |
---|---|---|---|---|
PoE | FB15K[ | 链接预测 | 通过融合了三个多模态知识图谱验证了不同模态对于sameAs链接预测任务是互补的假设。PoE-lrni的后缀代表嵌入了l(潜在)、r(关系)、n(数字)、i(图像)。PoE-lrni在链接预测任务中优于其他基线模型。其中PoE-lrni和PoE-rni效果最佳,并且嵌入专家响应占主导地位 | PoE模型通过合并视觉信息进行扩展,在KG中,目标是学习一个PoE,将高概率分配给真实的三元组,并将低概率分配给假定为假的三元组。对于每个关系类型都对于一个expert |
MKBE | YAGO-10[ | 链接预测、评级预测 | 与链接预测模型DistMult和ConvE相比,MKBE在链接预测任务中准确率更高 | MKBE用神经网络编码器和解码器来替换任何基于嵌入的关系模型的初始层,应用于DistMult和ConvE。适合数据类型(例如文本、图像、数值和分类值)的知识图谱建模 |
MMEA | FB15K-DB15K[ FB15K-YAG15K[ | 多模态知识图谱实体对齐 | 在实体对齐任务中对比TransE、MTransE、IPTransE、SEA、GCN、IMUSE等,在两个数据集上MMEA是性能最好的模型,并且MMEA更能充分利用有限的数据 | MMEA包括多模态知识嵌入和多模态知识融合两个模块。在第一个模块中提取关系、视觉和数字信息用于补充实体特征,在第二个模块中将多模态知识融合,并使用交互式训练 |
1 | SINGHAL A. Introducing the knowledge graph: things, not strings [EB/OL]. (2012-05-16) [2023-03-12]. . |
2 | YUHAS B P, GOLDSTEIN M H, SEJNOWSKI T J. Integration of acoustic and visual speech signals using neural networks [J]. IEEE Communications Magazine, 1989, 27(11): 65-71. 10.1109/35.41402 |
3 | BALTRUŠAITIS T, AHUJA C, L-P MORENCY. Multimodal machine learning: a survey and taxonomy [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423-443. 10.1109/tpami.2018.2798607 |
4 | JI S, PAN S, CAMBRIA E, et al. A survey on knowledge graphs: representation, acquisition, and applications [J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 494-514. 10.1109/tnnls.2021.3070843 |
5 | BORDES A, USUNIER N, GARCIA-DURÁN A, et al. Translating embeddings for modeling multi-relational data [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2013, 2: 2787-2795. |
6 | LIN Y, LIU Z, SUN M, et al. Learning entity and relation embeddings for knowledge graph completion [C]// Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2015: 2181-2187. 10.1609/aaai.v29i1.9491 |
7 | WANG Z, ZHANG J, FENG J, et al. Knowledge graph embedding by translating on hyperplanes [C]// Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2014: 1112-1119. 10.1609/aaai.v28i1.8870 |
8 | JI G, HE S, XU L, et al. Knowledge graph embedding via dynamic mapping matrix [C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2015: 687-696. 10.3115/v1/p15-1067 |
9 | XIAO H, HUANG M, HAO Y, et al. TransA: an adaptive approach for knowledge graph embedding [C/OL]// Proceedings of the 2015 AAAI Conference on Artificial Intelligence. [2023-01-05]. . |
10 | NICKEL M, TRESP V, H-P KRIEGEL. A three-way model for collective learning on multi-relational data [C]// Proceedings of the 28th International Conference on International Conference on Machine Learning. Red Hook: Omnipress, 2011: 809-816. 10.1145/2187836.2187874 |
11 | JENATTON R, ROUX N, BORDES A, et al. A latent factor model for highly multi-relational data [C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2012, 2: 3167-3175. |
12 | BALAŽEVIĆ I, ALLEN C, HOSPEDALES T. TuckER: tensor factorization for knowledge graph completion [C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 5185-5194. 10.18653/v1/d19-1522 |
13 | BORDES A, WESTON J, COLLOBERT R, et al. Learning structured embeddings of knowledge bases [C]// Proceedings of the 25th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2011: 301-306. 10.1609/aaai.v25i1.7917 |
14 | YANG B, S-W YIH, HE X, et al. Embedding entities and relations for learning and inference in knowledge bases [EB/OL]. [2023-01-05]. . |
15 | DETTMERS T, MINERVINI P, STENETORP P, et al. Convolutional 2D knowledge graph embeddings [C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. Menlo Park: AAAI Press, 2018, 32(1): 1811-1818. 10.1609/aaai.v32i1.11573 |
16 | NGUYEN D Q, NGUYEN T D, NGUYEN D Q, et al. A novel embedding model for knowledge base completion based on convolutional neural network [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2(Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 327-333. 10.18653/v1/n18-2053 |
17 | XIE Z, ZHOU G, LIU J, et al. ReInceptionE: relation-aware inception network with joint local-global structural information for knowledge graph embedding [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 5929-5939. 10.18653/v1/2020.acl-main.526 |
18 | GUO L, SUN Z, HU W. Learning to exploit long-term relational dependencies in knowledge graphs [J]. Proceedings of Machine Learning Research, 2019, 97: 2505-2514. 10.1162/dint_a_00016 |
19 | MEZNI H, BENSLIMANE D, BELLATRECHE L. Context-aware service recommendation based on knowledge graph embedding [J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(11): 5225-5238. 10.1109/tkde.2021.3059506 |
20 | SCHLICHTKRULL M, KIPF T N, BLOEM P, et al. Modeling relational data with graph convolutional networks [C]// Proceedings of the 2018 European Semantic Web Conference. Cham: Springer, 2018: 593-607. 10.1007/978-3-319-93417-4_38 |
21 | LIU X, TAN H, CHEN Q, et al. RAGAT: Relation aware graph attention network for knowledge graph completion [J]. IEEE Access, 2021, 9: 20840-20849. 10.1109/access.2021.3055529 |
22 | LI Z, LIU H, ZHANG Z, et al. Learning knowledge graph embedding with heterogeneous relation attention networks [J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(8): 3961-3973. 10.1109/tnnls.2021.3055147 |
23 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |
24 | DEVLIN J, CHANG M-W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2019, 1: 4171-4186. |
25 | WANG Q, HUANG P, WANG H, et al. CoKE: contextualized knowledge graph embedding [EB/OL]. (2020-04-04) [2023-03-25]. . |
26 | YAO L, MAO C, LUO Y. KG-BERT: BERT for knowledge graph completion [EB/OL]. (2019-09-11) [2023-04-03]. . |
27 | CHEN S, LIU X, GAO J, et al. HittER: hierarchical transformers for knowledge graph embeddings [C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2021: 10395-10407. 10.18653/v1/2021.emnlp-main.812 |
28 | ALAM M M, RONY M R A H, NAYYERI M, et al. Language model guided knowledge graph embeddings [J]. IEEE Access, 2022, 10: 76008-76020. 10.1109/access.2022.3191666 |
29 | CHEN Y, GE X, YANG S, et al. A survey on multimodal knowledge graphs: construction, completion and applications [J]. Mathematics, 2023, 11(8): 1815. 10.3390/math11081815 |
30 | NIU Y, TANG K, ZHANG H, et al. Counterfactual VQA: a cause-effect look at language bias [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12695-12705. 10.1109/cvpr46437.2021.01251 |
31 | ZHAO W, HU Y, WANG H, et al. Boosting entity-aware image captioning with multi-modal knowledge graph [EB/OL]. (2021-07-26) [2023-04-12]. . 10.1109/tmm.2023.3301279 |
32 | LIANG K, MENG L, LIU M, et al. Reasoning over different types of knowledge graphs: static, temporal and multi-modal [EB/OL]. (2023-05-27) [2023-06-16]. . |
33 | WANG M, WANG S, YANG H, et al. Is visual context really helpful for knowledge graph? A representation learning perspective [C]// Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 2735-2743. 10.1145/3474085.3475470 |
34 | SUN R, CAO X, ZHAO Y, et al. Multi-modal knowledge graphs for recommender systems [C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York: ACM, 2020: 1405-1414. 10.1145/3340531.3411947 |
35 | XU G, CHEN H, LI F L, et al. AliMe, MKG: A multi-modal knowledge graph for live-streaming e-commerce [C]// Proceedings of the 30th ACM International Conference on Information & Knowledge Management. New York: ACM, 2021: 4808-4812. 10.1145/3459637.3481983 |
36 | LEHMANN J, ISELE R, JAKOB M, et al. DBpedia — a large-scale, multilingual knowledge base extracted from Wikipedia [J]. Semantic Web, 2015, 6(2): 167-195. 10.3233/sw-140134 |
37 | CHEN X, SHRIVASTAVA A, GUPTA A. NEIL: extracting visual knowledge from web data [C]// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2013: 1409-1416. 10.1109/iccv.2013.178 |
38 | VRANDEČIĆ D, KRÖTZSCH M. Wikidata: a free collaborative knowledgebase [J]. Communications of the ACM, 2014, 57(10): 78-85. 10.1145/2629489 |
39 | FERRADA S, BUSTOS B, HOGAN A. IMGpedia: a linked dataset with content-based analysis of Wikimedia images [C]// Proceedings of the 2017 International Semantic Web Conference. Cham: Springer, 2017: 84-93. 10.1007/978-3-319-68204-4_8 |
40 | LIU Z, WANG S, ZHENG L, et al. Robust ImageGraph: rank-level feature fusion for image search [J]. IEEE Transactions on Image Processing, 2017, 26(7): 3128-3141. 10.1109/tip.2017.2660244 |
41 | LIU Y, LI H, GARCIA-DURAN A, et al. MMKG: multi-modal knowledge graphs [C]// Proceedings of the 2019 European Semantic Web Conferenc. Cham: Springer, 2019: 459-474. 10.1007/978-3-030-21348-0_30 |
42 | LI M, ZAREIAN A, LIN Y, et al. GAIA: A fine-grained multimedia knowledge extraction system [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Stroudsburg, PA: Association for Computational Linguistics, 2020: 77-86. 10.18653/v1/2020.acl-demos.11 |
43 | KANNAN A V, FRADKIN D, AKROTIRIANAKIS I, et al. Multimodal knowledge graph for deep learning papers and code [C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York: ACM, 2020: 3417-3420. 10.1145/3340531.3417439 |
44 | WANG M, WANG H, QI G, et al. Richpedia: a large-scale, comprehensive multi-modal knowledge graph [J]. Big Data Research, 2020, 22: 100159. 10.1016/j.bdr.2020.100159 |
45 | ALBERTS H, HUANG N, DESHPANDE Y, et al. VisualSem: a high-quality knowledge graph for vision and language [C]// Proceedings of the 1st Workshop on Multilingual Representation Learning. Stroudsburg, PA: Association for Computational Linguistics, 2021: 138-152. 10.18653/v1/2021.mrl-1.13 |
46 | BLOEM P, WILCKE X, VAN BERKEL L, et al. kgbench: A collection of knowledge graph datasets for evaluating relational and multimodal machine learning [C]// Proceedings of the 2021 European Semantic Web Conference. Cham: Springer, 2021: 614-630. 10.1007/978-3-030-77385-4_37 |
47 | WANG Z, LI L, LI Q, et al. Multimodal data enhanced representation learning for knowledge graphs [C]// Proceedings of the 2019 International Joint Conference on Neural Networks. Piscataway: IEEE, 2019: 1-8. 10.1109/ijcnn.2019.8852079 |
48 | WANG Z, ZHANG J, FENG J, et al. Knowledge graph and text jointly embedding [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1591-1601. 10.3115/v1/d14-1167 |
49 | TOUTANOVA K, CHEN D, PANTEL P, et al. Representing text for joint embedding of text and knowledge bases [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2015: 1499-1509. 10.18653/v1/d15-1174 |
50 | RIEDEL S, YAO L, McCALLUM A, et al. Relation extraction with matrix factorization and universal schemas [C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2013: 74-84. |
51 | WANG Z, LI J. Text-enhanced representation learning for knowledge graph [C]// Proceedings of the 25th International Joint Conference on Artificial Intelligent. Menlo Park: AAAI Press, 2016: 1293-1299. |
52 | XIE R, LIU Z, JIA J, et al. Representation learning of knowledge graphs with entity descriptions [C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2016: 2659-2665. 10.1609/aaai.v30i1.10329 |
53 | XIE R, LIU Z, SUN M. Representation learning of knowledge graphs with hierarchical types [C]// Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2016: 2965-2971. 10.24963/ijcai.2017/438 |
54 | XIAO H, HUANG M, MENG L, et al. SSP: semantic space projection for knowledge graph embedding with text descriptions [C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2017: 3104-3110. 10.1609/aaai.v31i1.10952 |
55 | AN B, CHEN B, HAN X, et al. Accurate text-enhanced knowledge graph representation learning [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 745-755. 10.18653/v1/n18-1068 |
56 | CHEN M, TIAN Y, CHANG K-W, et al. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2018: 3998-4004. 10.24963/ijcai.2018/556 |
57 | BOLLACKER K, EVANS C, PARITOSH P, et al. Freebase: a collaboratively created graph database for structuring human knowledge [C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2008: 1247-1250. 10.1145/1376616.1376746 |
58 | Wikipedia. Wikipedia [M]. [S.l.]: PediaPress, 2004. |
59 | BORDES A, GLOROT X, WESTON J, et al. A semantic matching energy function for learning with multi-relational data: Application to word-sense disambiguation [J]. Machine Learning, 2014, 94: 233-259. 10.1007/s10994-013-5363-6 |
60 | MILLER D. On Nationality [M]. Oxford: Clarendon Press, 1995. 10.2307/2945854 |
61 | LI Z, FENG S, SHI J, et al. Future event prediction based on temporal knowledge graph embedding [J]. Computer Systems Science and Engineering, 2023, 44(3): 2411-2423. 10.32604/csse.2023.026823 |
62 | SOCHER R, CHEN D, MANNING C D, et al. Reasoning with neural tensor networks for knowledge base completion [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems, Red Hook: Curran Associates Inc., 2013: 926-934. |
63 | XIE R, LIU Z, LUAN H, et al. Image-embodied knowledge representation learning [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2017: 3140-3146. 10.24963/ijcai.2017/438 |
64 | MOUSSELLY-SERGIEH H, BOTSCHEN T, GUREVYCH I, et al. A multimodal translation-based approach for knowledge graph representation learning [C]// Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. Stroudsburg, PA: Association for Computational Linguistics, 2018: 225-234. 10.18653/v1/s18-2027 |
65 | LONIJ V P A, RAWAT A, M-I NICOLAE. Extending knowledge bases using images [C/OL]// Proceedings of the 31st Conference on Neural Information Processing Systems. [2023-01-05]. . |
66 | LIU F, CHEN M, ROTH D, et al. Visual pivoting for (unsupervised) entity alignment [C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2021: 4257-4266. 10.1609/aaai.v35i5.16550 |
67 | LU X, WANG L, JIANG Z, et al. MMKRL: A robust embedding approach for multi-modal knowledge graph representation learning [J]. Applied Intelligence, 2022, 52: 7480-7497. 10.1007/s10489-021-02693-9 |
68 | LIANG S, ZHU A, ZHANG J, et al. Hyper-node relational graph attention network for multi-modal knowledge graph completion [J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19(2): Article No.62. 10.1145/3545573 |
69 | DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database [C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 248-255. 10.1109/cvpr.2009.5206848 |
70 | RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge [J]. International Journal of Computer Vision, 2015, 115: 211-252. 10.1007/s11263-015-0816-y |
71 | SUN Z, HU W, LI C. Cross-lingual entity alignment via joint attribute-preserving embedding [C]// Proceedings of the 2017 International Semantic Web Conference. Cham: Springer, 2017: 628-644. 10.1007/978-3-319-68288-4_37 |
72 | TOUTANOVA K, CHEN D. Observed versus latent features for knowledge base and text inference [C]// Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality. Stroudsburg, PA: Association for Computational Linguistics, 2015: 57-66. 10.18653/v1/w15-4007 |
73 | JIN D, QI Z, LUO Y, et al. TransFusion: multi-modal fusion for video tag inference via translation-based knowledge embedding [C]// Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 1093-1101. 10.1145/3474085.3481535 |
74 | SHAN Y, HOENS T R, JIAO J, et al. Deep crossing: web-scale modeling without manually crafted combinatorial features [C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 255-262. 10.1145/2939672.2939704 |
75 | DENG J, SHEN D, PAN H, et al. A unified model for video understanding and knowledge embedding with heterogeneous knowledge graph dataset [EB/OL]. (2023-04-02) [2023-04-19]. . 10.1145/3591106.3592258 |
76 | LI N, SHEN Q, SONG R, et al. MEduKG: A deep-learning-based approach for multi-modal educational knowledge graph construction [J]. Information, 2022, 13(2): Article No. 91. 10.3390/info13020091 |
77 | PEZESHKPOUR P, CHEN L, SINGH S. Embedding multimodal relational data for knowledge base completion [C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 3208-3218. 10.18653/v1/d18-1359 |
78 | CHEN L, LI Z, WANG Y, et al. MMEA: entity alignment for multi-modal knowledge graph [C]// Proceedings of the 2020 International Conference on Knowledge Science, Engineering and Management. Cham: Springer, 2020: 134-147. 10.1007/978-3-030-55130-8_12 |
79 | HARPER F M, KONSTAN J A. The MovieLens datasets: History and context [J]. ACM Transactions on Interactive Intelligent Systems, 2015, 5(4): Article No. 19. 10.1145/2827872 |
80 | PANDIT H J, DEBRUYNE C, O’SULLIVAN D, et al. GConsent — a consent ontology based on the GDPR [C]// Proceedings of the 2019 European Semantic Web Conference. Cham: Springer, 2019: 270-282. 10.1007/978-3-030-21348-0_18 |
81 | WILCKE W X, BLOEM P, DE BOER V, et al. End-to-End entity classification on multimodal knowledge graphs [EB/OL]. (2020-05-25) [2023-05-02]. . |
82 | GUO H, TANG J, ZENG W, et al. Multi-modal entity alignment in hyperbolic space [J]. Neurocomputing, 2021, 461: 598-607. 10.1016/j.neucom.2021.03.132 |
83 | CHEN D, LI Z, GU B, et al. Multimodal named entity recognition with image attributes and image knowledge [C]// Proceedings of the 2021 International Conference on Database Systems for Advanced Applications. Cham: Springer, 2021: 186-201. 10.1007/978-3-030-73197-7_12 |
84 | YU J, ZHU Z, WANG Y, et al. Cross-modal knowledge reasoning for knowledge-based visual question answering [J]. Pattern Recognition, 2020, 108: 107563. 10.1016/j.patcog.2020.107563 |
85 | SHI B, JI L, LU P, et al. Knowledge aware semantic concept expansion for image-text matching [C/OL]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. [2023-01-05]. . 10.24963/ijcai.2019/720 |
86 | CHAUDHARY C, GOYAL P, PRASAD D N, et al. Enhancing the quality of image tagging using a visio-textual knowledge base [J]. IEEE Transactions on Multimedia, 2020, 22(4): 897-911. 10.1109/tmm.2019.2937181 |
87 | TAO S, QIU R, PING Y, et al. Multi-modal knowledge-aware reinforcement learning network for explainable recommendation [J]. Knowledge-Based Systems, 2021, 227: 107217. 10.1016/j.knosys.2021.107217 |
88 | ZHANG C, ZHANG C, ZHENG S, et al. A complete survey on generative AI (AIGC): is ChatGPT from GPT-4 to GPT-5 all you need? [EB/OL]. (2023-03-21) [2023-05-23]. . |
[1] | Qiang ZHAO, Zhongqing WANG, Hongling WANG. Product summarization extraction model with multimodal information fusion [J]. Journal of Computer Applications, 2024, 44(1): 73-78. |
[2] | Yirui HUANG, Junwei LUO, Jingqiang CHEN. Multi-modal dialog reply retrieval based on contrast learning and GIF tag [J]. Journal of Computer Applications, 2024, 44(1): 32-38. |
[3] | Junhao LUO, Yan ZHU. Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis [J]. Journal of Computer Applications, 2024, 44(1): 79-85. |
[4] | Mu LI, Yuheng YANG, Xizheng KE. Emotion recognition model based on hybrid-mel gama frequency cross-attention transformer modal [J]. Journal of Computer Applications, 2024, 44(1): 86-93. |
[5] | Wei TONG, Liyang HE, Rui LI, Wei HUANG, Zhenya HUANG, Qi LIU. Efficient similar exercise retrieval model based on unsupervised semantic hashing [J]. Journal of Computer Applications, 2024, 44(1): 206-216. |
[6] | Jinghong WANG, Zhixia ZHOU, Hui WANG, Haokang LI. Attribute network representation learning with dual auto-encoder [J]. Journal of Computer Applications, 2023, 43(8): 2338-2344. |
[7] | Zelin XU, Min YANG, Meng CHEN. Point-of-interest category representation model with spatial and textual information [J]. Journal of Computer Applications, 2023, 43(8): 2456-2461. |
[8] | Kun ZHANG, Fengyu YANG, Fa ZHONG, Guangdong ZENG, Shijian ZHOU. Source code vulnerability detection based on hybrid code representation [J]. Journal of Computer Applications, 2023, 43(8): 2517-2526. |
[9] | Kun FU, Yuhan HAO, Minglei SUN, Yinghua LIU. Network representation learning based on autoencoder with optimized graph structure [J]. Journal of Computer Applications, 2023, 43(10): 3054-3061. |
[10] | Yizhen BI, Huan MA, Changqing ZHANG. Dynamic evaluation method for benefit of modality augmentation [J]. Journal of Computer Applications, 2023, 43(10): 3099-3106. |
[11] | Hangyuan DU, Sicong HAO, Wenjian WANG. Semi-supervised representation learning method combining graph auto-encoder and clustering [J]. Journal of Computer Applications, 2022, 42(9): 2643-2651. |
[12] | Le ZHOU, Tingting DAI, Chun LI, Jun XIE, Boce CHU, Feng LI, Junyi ZHANG, Qiao LIU. Network representation learning model based on node attribute bipartite graph [J]. Journal of Computer Applications, 2022, 42(8): 2311-2318. |
[13] | Minghui WU, Guangjie ZHANG, Canghong JIN. Time series prediction model based on multimodal information fusion [J]. Journal of Computer Applications, 2022, 42(8): 2326-2332. |
[14] | Huanliang SUN, Cheng PENG, Junling LIU, Jingke XU. Community structure representation learning for "15-minute living circle" [J]. Journal of Computer Applications, 2022, 42(6): 1782-1788. |
[15] | Shoulong JIAO, Youxiang DUAN, Qifeng SUN, Zihao ZHUANG, Chenhao SUN. Knowledge representation learning method incorporating entity description information and neighbor node features [J]. Journal of Computer Applications, 2022, 42(4): 1050-1056. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||