Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (3): 683-689.DOI: 10.11772/j.issn.1001-9081.2023040413

• Artificial intelligence • Previous Articles     Next Articles

Knowledge-guided visual relationship detection model

Yuanlong WANG(), Wenbo HU, Hu ZHANG   

  1. School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China
  • Received:2023-04-13 Revised:2023-07-04 Accepted:2023-07-10 Online:2023-12-04 Published:2024-03-10
  • Contact: Yuanlong WANG
  • About author:HU Wenbo,born in 1998, M. S. candidate. His research interests include natural language processing, computer vision.
    ZHANG Hu,born in 1979, Ph. D., professor. His research interests include natural language processing.
  • Supported by:
    National Natural Science Foundation of China(62176145)

知识引导的视觉关系检测模型

王元龙(), 胡文博, 张虎   

  1. 山西大学 计算机与信息技术学院,太原 030006
  • 通讯作者: 王元龙
  • 作者简介:胡文博(1998—),男,山西运城人,硕士研究生,主要研究方向:自然语言处理、计算机视觉
    张虎(1979—),男,山西大同人,教授,博士,CCF会员,主要研究方向:自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(62176145)

Abstract:

The task of Visual Relationship Detection (VRD) is to further detect the relationship between target objects on the basis of target recognition, which belongs to the key technology of visual understanding and reasoning. Due to the interaction and combination between objects, it is easy to cause the combinatorial explosion problem of relationship between objects, resulting in many entity pairs with weak correlation, which in turn makes the subsequent relationship detection recall rate low. To solve the above problems, a knowledge-guided visual relationship detection model was proposed. Firstly, visual knowledge was constructed, data analysis and statistics were carried out on entity labels and relationship labels in common visual relationship detection datasets, and the interaction co-occurrence frequency between entities and relationships was obtained as visual knowledge. Then, the constructed visual knowledge was used to optimize the combination process of entity pairs, the score of entity pairs with weak correlation decreased, while the score of entity pairs with strong correlation increased, and then the entity pairs were ranked according to their scores and the entity pairs with lower scores were deleted; the relationship score was also optimized in a knowledge-guided way for the relationship between entities, so as to improve the recall rate of the model. The effect of the proposed model was verified in the public datasets VG (Visual Genome) and VRD, respectively. In predicate classification tasks, compared with the existing model PE-Net (Prototype-based Embedding Network), the recall rates Recall@50 and Recall@100 improved by 1.84 and 1.14 percentage points respectively in the VG dataset. Compared to Coacher, the Recall@20, Recall@50 and Recall@100 increased by 0.22, 0.32 and 0.31 percentage points respectively in the VRD dataset.

Key words: Visual Relationship Detection (VRD), entity pair ranking, combinatorial explosion, co-occurrence frequency, knowledge guidance

摘要:

视觉关系检测(VRD)任务是在目标识别的基础上,进一步检测目标对象之间的关系,属于视觉理解和推理的关键技术。由于对象之间交互组合,容易造成对象间关系组合爆炸的问题,从而产生很多关联性较弱的实体对,导致后续的关系检测召回率较低。针对上述问题,提出知识引导的视觉关系检测模型。首先构建视觉知识,对常见的视觉关系检测数据集中的实体标签和关系标签进行数据分析与统计,得到实体和关系间交互共现频率作为视觉知识;然后利用所构建的视觉知识,优化实体对的组合流程,降低关联性较弱的实体对得分,提升关联性较强的实体对得分,进而按照实体对的得分排序并删除得分较低的实体对,对于实体之间的关系也同样采用知识引导的方式优化关系得分,从而提升模型的召回率。在公开数据集视觉基因库(VG)和VRD中验证所提模型的效果:在谓词分类任务中,与现有模型PE-Net(Prototype-based Embedding Network)相比,在VG数据集上,召回率Recall@50和Recall@100分别提高了1.84和1.14个百分点;在VRD数据集上,相较于Coacher,Recall@20、Recall@50和Recall@100分别提高了0.22、0.32和0.31个百分点。

关键词: 视觉关系检测, 实体对排序, 组合爆炸, 共现频率, 知识引导

CLC Number: