Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (2): 584-593.DOI: 10.11772/j.issn.1001-9081.2024010139

• Multimedia computing and computer simulation • Previous Articles    

Panoptic scene graph generation method based on relation feature enhancement

Linhao LI1,2,3, Yize WANG1, Yingshuang LI1,2,3(), Yongfeng DONG1,2,3, Zhen WANG1,2,3   

  1. 1.School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China
    2.Hebei Province Key Laboratory of Big Data Computing (Hebei University of Technology),Tianjin 300401,China
    3.Hebei Data Driven Industrial Intelligent Engineering Research Center (Hebei University of Technology),Tianjin 300401,China
  • Received:2024-02-06 Revised:2024-04-11 Accepted:2024-04-24 Online:2024-05-09 Published:2025-02-10
  • Contact: Yingshuang LI
  • About author:LI Linhao, born in 1989, Ph. D., associate professor. His research interests include machine learning, computer vision, knowledge inference.
    WANG Yize, born in 1999, M. S. candidate. His research interests include machine learning, scene graph generation.
    DONG Yongfeng, born in 1977, Ph. D., professor. His research interests include artificial intelligence, computer vision, intelligent information processing.
    WANG Zhen, born in 1989, Ph. D., associate professor. His research interests include machine learning, computer vision, trusted learning.
  • Supported by:
    Natural Science Foundation of Hebei Higher Education Institutions(QN2023262)

基于关系特征强化的全景场景图生成方法

李林昊1,2,3, 王逸泽1, 李英双1,2,3(), 董永峰1,2,3, 王振1,2,3   

  1. 1.河北工业大学 人工智能与数据科学学院,天津 300401
    2.河北省大数据计算重点实验室(河北工业大学),天津 300401
    3.河北省数据驱动工业智能工程研究中心(河北工业大学),天津 300401
  • 通讯作者: 李英双
  • 作者简介:李林昊(1989—),男,山东威海人,副教授,博士,CCF会员,主要研究方向:机器学习、计算机视觉、知识推理
    王逸泽(1999—),男,河北邢台人,硕士研究生,主要研究方向:机器学习、场景图生成
    董永峰(1977—),男,河北定州人,教授,博士,CCF会员,主要研究方向:人工智能、计算机视觉、智能信息处理
    王振(1989—),男,河北保定人,副教授,博士,主要研究方向:机器学习、计算机视觉、可信学习。
  • 基金资助:
    河北省高等学校自然科学研究项目(QN2023262)

Abstract:

Panoptic Scene Graph Generation (PSGG) aims to identify all objects within an image and capture the intricate semantic association among them automatically. Semantic association modeling depends on feature description of target objects and subject-object pair. However, current methods have several limitations: object features extracted through bounding box extraction are ambiguous; the methods only focus on the semantic and spatial position features of objects, while ignoring the semantic joint features and relative position features of subject-object pair, which are equally essential for accurate relation predictions; current methods fail to extract features of different types of subject-object pair (e.g., foreground-foreground, foreground-background, background-background) differentially, ignoring their inherent differences. To address these challenges, a PSGG method based on Relation Feature Enhancement (RFE) was proposed. Firstly, by introducing pixel-level mask regional features, the detailed information of object features was enriched, and the joint visual features, semantic joint features, and relative position features of subject-objects were integrated effectively. Secondly, depending on the specific type of subject-object, the most suitable feature extraction method was selected adaptively. Finally, more accurate relation features after enhancement were obtained for relation prediction. Experimental results on the PSG dataset demonstrate that with VCTree (Visual Contexts Tree), Motifs, IMP (Iterative Message Passing), and GPSNet as baseline methods, and ResNet-101 as the backbone network, RFE achieves increases of 4.37, 3.68, 2.08, and 1.80 percentage points, respectively, in R@20 index for challenging SGGen tasks. The above validates the effectiveness of the proposed method in PSGG.

Key words: Panoptic Scene Graph Generation (PSGG), subject-object pair joint feature, relation feature reinforcement, semantic association, adaptive selection

摘要:

全景场景图生成(PSGG)旨在识别图像中所有对象并自动地捕获所有对象间的语义关联关系。语义关联关系建模依赖目标对象及对象对(subject-object pair)的特征描述,然而现行工作中存在以下不足:采用边界框提取方式获取的对象特征较模糊;仅关注对象的语义和空间位置特征,忽略了对关系预测同样重要的对象对的语义联合特征和相对位置特征;未能针对不同类型的对象对(如前景-前景、前景-背景、背景-背景)进行差异化特征提取,进而忽略了它们之间的差异性。针对上述问题,提出一种基于关系特征强化的全景场景图生成方法(RFE)。首先,通过引入像素级掩码区域特征,丰富对象特征的细节信息,同时有效地融合对象对的联合视觉特征、语义联合特征和相对位置特征;其次,根据对象对的不同类型,自适应地选择最适合本类型对象对的特征提取方式;最后,获得强化后更精确的关系特征用于关系预测。在PSG数据集上的实验结果表明,以VCTree(Visual Contexts Tree)、Motifs、IMP(Iterative Message Passing)和GPSNet为基线方法,ResNet-101为骨干网络,RFE在具有挑战性的SGGen任务上召回率(R@20)指标分别提高了4.37、3.68、2.08和1.80个百分点,验证了所提方法在PSGG的有效性。

关键词: 全景场景图生成, 对象对联合特征, 关系特征强化, 语义关联关系, 自适应选择

CLC Number: