基于关联信息增强与关系平衡的场景图生成方法

doi:10.11772/j.issn.1001-9081.2024010135

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 953-962.DOI: 10.11772/j.issn.1001-9081.2024010135

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于关联信息增强与关系平衡的场景图生成方法

李林昊¹^,²^,³, 韩冬¹, 董永峰¹^,²^,³(), 李英双¹^,²^,³, 王振¹^,²^,³

^1.河北工业大学人工智能与数据科学学院，天津 300401
^2.河北省大数据计算重点实验室（河北工业大学），天津 300401
^3.河北省数据驱动工业智能工程研究中心（河北工业大学），天津 300401

收稿日期:2024-02-05 修回日期:2024-04-18 接受日期:2024-04-19 发布日期:2024-05-09 出版日期:2025-03-10
通讯作者: 董永峰
作者简介:李林昊（1989—），男，山东威海人，副教授，博士，CCF会员，主要研究方向：机器学习、计算机视觉、知识推理
韩冬（1998—），男，黑龙江齐齐哈尔人，硕士研究生，主要研究方向：机器学习、计算机视觉
李英双（1986—），女，河北衡水人，工程师，硕士，主要研究方向：人工智能
王振（1989—），男，河北唐山人，副教授，博士，主要研究方向：机器学习、计算机视觉、可信学习。
基金资助:
国家自然科学基金资助项目(62306103)

Scene graph generation method based on association information enhancement and relationship balance

Linhao LI¹^,²^,³, Dong HAN¹, Yongfeng DONG¹^,²^,³(), Yingshuang LI¹^,²^,³, Zhen WANG¹^,²^,³

^1.School of Artificial Intelligence，Hebei University of Technology，Tianjin 300401，China
^2.Hebei Province Key Laboratory of Big Data Computing （Hebei University of Technology），Tianjin 300401，China
^3.Hebei Data Driven Industrial Intelligent Engineering Research Center （Hebei University of Technology），Tianjin 300401，China

Received:2024-02-05 Revised:2024-04-18 Accepted:2024-04-19 Online:2024-05-09 Published:2025-03-10
Contact: Yongfeng DONG
About author:LI Linhao, born in 1989, Ph. D., associate professor. His research interests include machine learning, computer vision, knowledge inference.
HAN Dong, born in 1998, M. S. candidate. His research interests include machine learning, computer vision.
LI Yingshuang, born in 1986, M. S., engineer. Her research interests include artificial intelligence.
WANG Zhen, born in 1989, Ph. D., professor. His research interests include machine learning, computer vision, trusted learning.
Supported by:
National Natural Science Foundation of China(62306103)

摘要/Abstract

摘要：

利用场景图的上下文信息可以帮助模型理解目标之间的关联作用；然而，大量不相关的目标可能带来额外噪声，进而影响信息交互，造成预测偏差。在嘈杂且多样的场景中，即使几个简单的关联目标，也足够推断目标所处的环境信息，并消除其他目标的歧义信息。此外，在面对真实场景中的长尾偏差数据时，场景图生成（SGG）的性能难以令人满意。针对上下文信息增强和预测偏差的问题，提出一种基于关联信息增强与关系平衡的SGG（IERB）方法。IERB方法采用一种二次推理结构，即根据有偏场景图的预测结果重新构建不同预测视角下的关联信息并平衡预测偏差。首先，聚焦不同视角下的强相关目标以构建上下文关联信息；其次，利用树型结构的平衡策略增强尾部关系的预测能力；最后，采用一种预测引导方式在已有场景图的基础上预测优化。在通用的数据集Visual Genome上的实验结果表明，与3类基线模型VTransE（Visual Translation Embedding network）、Motif和VCTree（Visual Context Tree）相比，所提方法在谓词分类（PredCls）任务下的均值召回率mR@100分别提高了11.66、13.77和13.62个百分点，验证了所提方法的有效性。

关键词: 场景图生成, 信息增强, 有偏预测, 关系平衡, 预测优化

Abstract:

Utilizing contextual information of scene graphs can help models understand the correlation effect among targets. However， a large number of unrelated targets may introduce additional noise， affecting information interaction and causing prediction biases. In noisy and diverse scenes， even a few simple associated targets are sufficient to infer environmental information of the target and eliminate ambiguity information of other targets. In addition， Scene Graph Generation （SGG） faces challenges when dealing with long-tailed biased data in real-world scenarios. To address the problems of contextual information optimization and prediction biases， an association Information Enhancement and Relationship Balance based SGG （IERB） method was proposed. In IERB method， a secondary reasoning structure was employed according to biased scene graph prediction results， to reconstruct association information under different prediction angles of view and balance the prediction biases. Firstly， strongly correlated targets from different angles of view were focused on to construct the contextual association information. Secondly， the prediction capability for tail relationships was enhanced using a balancing strategy of tree structure. Finally， a prediction-guided approach was used to optimize predictions based on the existing scene graph. Experimental results on Visual Genome dataset show that compared with three baseline models Visual Translation Embedding network （VTransE）， Motif， and Visual Context Tree （VCTree）， the proposed method improves the mean Recall mR@100 in the Predicate Classification （PredCls） task by 11.66， 13.77 and 13.62 percentage points， respectively， demonstrating the effectiveness of the proposed method.

Key words: Scene Graph Generation (SGG), information enhancement, biased prediction, relationship balancing, prediction optimization

中图分类号:

TP391.41

李林昊, 韩冬, 董永峰, 李英双, 王振. 基于关联信息增强与关系平衡的场景图生成方法[J]. 计算机应用, 2025, 45(3): 953-962.

Linhao LI, Dong HAN, Yongfeng DONG, Yingshuang LI, Zhen WANG. Scene graph generation method based on association information enhancement and relationship balance[J]. Journal of Computer Applications, 2025, 45(3): 953-962.

图/表 14

图1 不同视角下的推理结果

Fig. 1 Reasoning results from different angles of view

图2 IERB方法的结构

Fig. 2 Structure of IERB method

图3 强相关目标选择

Fig. 3 Strongly correlated target selection

图4 关系平衡树

Fig. 4 Relationship balance tree

图5 联合特征的潜在问题

Fig. 5 Potential problem of joint feature

图6 联合特征的分离训练

Fig. 6 Separated training of joint feature

图7 预测引导

Fig. 7 Prediction guidance

表1 VG150数据集上的实验结果对比 (%)

Tab. 1 Comparison of experimental results on VG150 dataset

模型	方法	PredCls			SGCls			SGDet
模型	方法	mR@20	mR@50	mR@100	mR@20	mR@50	mR@100	mR@20	mR@50	mR@100
VTransE	基线	11.57	14.75	16.11	9.69	11.82	12.79	4.70	6.34	7.58
	基线+IERB	16.21	22.89	27.77	10.51	12.91	14.59	5.41	7.34	9.06
	CogTree	20.58	25.80	28.02	9.69	11.82	12.79	5.31	6.97	8.20
	CogTree+IERB	22.47	27.44	29.59	10.99	13.84	14.23	5.35	7.48	9.27
Motif	基线	12.56	15.96	17.29	6.25	7.94	8.39	4.45	5.93	6.96
	基线+IERB	19.07	26.79	31.06	12.59	16.26	18.86	6.15	8.27	9.96
	TDE	14.64	20.84	24.57	8.07	11.22	12.75	5.09	7.03	8.68
	TDE+IERB	18.99	25.46	28.88	8.65	11.79	13.24	5.30	7.27	9.03
	EBM	12.74	16.24	17.60	7.82	9.69	10.30	4.69	6.07	7.15
	EBM+IERB	18.12	25.20	29.44	11.01	14.87	16.91	5.77	7.90	9.75
VCTree	基线	12.85	16.20	17.50	8.03	9.85	10.46	4.38	6.06	7.15
	基线+IERB	19.91	26.99	31.12	12.53	16.49	18.82	6.42	8.35	9.89
	TDE	15.56	21.99	25.43	8.44	11.53	13.27	5.26	7.27	8.85
	TDE+IERB	18.19	24.22	27.66	9.65	12.34	14.17	5.26	7.47	8.93
	EBM	14.16	18.11	19.70	9.40	11.46	12.18	4.90	6.76	7.94
	EBM+IERB	17.35	24.39	29.26	12.59	17.07	19.84	5.81	8.05	9.93

表2 在VG150数据集上IERB与无偏方法的直接对比结果 ( %)

Tab. 2 Direct comparison results of IERB and unbiased methods on VG150 dataset

模型	方法	PredCls
模型	方法	mR@20	mR@50	mR@100
Motif	基线	12.56	15.96	17.29
	TDE	14.64	20.84	24.57
	EBM	12.74	16.24	17.60
	SG	14.50	18.50	20.20
	CogTree	20.90	26.40	29.00
	DLFE	22.10	26.90	28.80
	IERB	19.07	26.97	31.06
VCTree	基线	12.85	16.20	17.50
	TDE	15.56	21.99	25.43
	EBM	14.16	18.11	19.70
	SG	15.00	19.20	21.10
	CogTree	20.78	26.45	29.66
	DLFE	20.80	25.30	27.10
	NARE	18.00	21.70	23.10
	IERB	19.91	26.99	31.12

图8 基线模型Motif上的改进

Fig. 8 Improvement on baseline model Motif

图9 Motif基线和Motif-IERB的关系预测可视化结果

Fig. 9 Visualization results of relationship prediction between Motif-baseline and Motif-IERB

表3 消融实验结果 (%)

Tab. 3 Ablation experimental results

强	树	联	引	mR@20	mR@50	mR@100
				12.56	15.96	17.29
√				13.54	17.12	19.48
√	√			15.55	23.41	27.97
√	√	√		16.90	23.85	28.66
√	√		√	17.55	24.67	29.06
√	√	√	√	19.07	26.79	31.06

表4 平衡系数ω对模型的影响

Tab. 4 Influence of balance factor ω on model

$ω$	PredCls的mR@K/%			SGCls的mR@K/%
$ω$	mR@20	mR@50	mR@100	mR@20	mR@50	mR@100
1.0	16.39	23.82	28.53	9.22	11.75	13.79
1.1	17.78	25.07	29.89	11.01	14.84	16.97
1.3	19.07	26.79	31.06	12.59	16.26	18.86
1.5	17.83	24.95	29.33	10.86	13.85	15.99
2.0	14.38	21.45	25.77	7.52	10.03	12.86

表4 平衡系数ω对模型的影响

Tab. 4 Influence of balance factor ω on model

$ω$	PredCls的mR@K/%			SGCls的mR@K/%
$ω$	mR@20	mR@50	mR@100	mR@20	mR@50	mR@100
1.0	16.39	23.82	28.53	9.22	11.75	13.79
1.1	17.78	25.07	29.89	11.01	14.84	16.97
1.3	19.07	26.79	31.06	12.59	16.26	18.86
1.5	17.83	24.95	29.33	10.86	13.85	15.99
2.0	14.38	21.45	25.77	7.52	10.03	12.86

表5 超参数θ对模型的影响

Tab. 5 Influence of hyperparameter θ on model

$θ$	PredCls的mR@K/%			SGCls的mR@K/%
$θ$	mR@20	mR@50	mR@100	mR@20	mR@50	mR@100
0.9	16.13	20.55	25.30	8.56	12.99	14.82
0.6	17.54	24.27	29.39	9.89	14.14	16.07
0.3	19.07	26.79	31.06	12.59	16.26	18.86
0.1	16.79	24.01	26.11	9.62	13.20	15.88

表5 超参数θ对模型的影响

Tab. 5 Influence of hyperparameter θ on model

$θ$	PredCls的mR@K/%			SGCls的mR@K/%
$θ$	mR@20	mR@50	mR@100	mR@20	mR@50	mR@100
0.9	16.13	20.55	25.30	8.56	12.99	14.82
0.6	17.54	24.27	29.39	9.89	14.14	16.07
0.3	19.07	26.79	31.06	12.59	16.26	18.86
0.1	16.79	24.01	26.11	9.62	13.20	15.88

参考文献 27

1	YANG X， TANG K， ZHANG H， et al. Auto-encoding scene graphs for image captioning ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 10677-10686.
2	GAO L， LEI Y， ZENG P， et al. Hierarchical representation network with auxiliary tasks for video captioning and video question answering ［J］. IEEE Transactions on Image Processing， 2022， 31： 202-215.
3	WANG S， WANG R， YAO Z， et al. Cross-modal scene graph matching for relationship-aware image-text retrieval ［C］// Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2020： 1497-1506.
4	JOHNSON J. GUPTA A. LI F F. Image generation from scene graphs ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1219-1228.
5	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with Transformers ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.
6	ZHANG H， KYAW Z， CHANG S F， et al. Visual translation embedding network for visual relation detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3107-3115.
7	GU J， HU H， WANG L， et al. Learning region features for object detection［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11216. Cham： Springer， 2018： 392-406.
8	KRISHNA R， ZHU Y， GROTH O， et al. Visual genome： connecting language and vision using crowdsourced dense image annotations［J］. International Journal of Computer Vision， 2017， 123（1）： 32-73.
9	HOCHREITER S， SCHMIDHUBER J. Long short-term memory［J］. Neural Computation， 1997， 9（8）： 1735-1780.
10	SCARSELLI F， GORI M， TSOI A C， et al. The graph neural network model ［J］. IEEE Transactions on Neural Networks， 2009， 20（1）： 61-80.
11	TANG K， ZHANG H， WU B， et al. Learning to compose dynamic tree structures for visual contexts［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 6612-6621.
12	XU D， ZHU Y， CHOY C B， et al. Scene graph generation by iterative message passing［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3097-3106.
13	YANG J， LU J， LEE S， et al. Graph R-CNN for scene graph generation ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11205. Cham： Springer， 2018： 690-706.
14	WANG W， WANG R， SHAN S， et al. Sketching image gist： human-mimetic hierarchical scene graph generation［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12358. Cham： Springer， 2020： 222-239.
15	WANG W， WANG R， SHAN S， et al. Exploring context and visual pattern of relationship for scene graph generation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 8180-8189.
16	YOON K， KIM K， MOON J， et al. Unbiased heterogeneous scene graph generation with relation-aware message passing neural network［C］// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2023： 3285-3294.
17	TANG K， NIU Y， HUANG J， et al. Unbiased scene graph generation from biased training［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 3713-3722.
18	LI R， ZHANG S， WAN B， et al. Bipartite graph network with adaptive message passing for unbiased scene graph generation［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 11104-11114.
19	SUHAIL M， MITTAL A， SIDDIQUIE B， et al. Energy-based learning for scene graph generation ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13931-13940.
20	YU J， CHAI Y， WANG Y， et al. CogTree： cognition tree loss for unbiased scene graph generation［C］// Proceedings of the 30th International Joint Conference on Artificial Intelligence. California： IJCAI.org， 2021： 1274-1280.
21	WANG K， XU X， LIU Y， et al. A Pre-LN Transformer network model with lexical features for fine-grained sentiment classification［C］// Proceedings of the 2021 China Conference on Information Retrieval， LNCS 13026. Cham： Springer， 2021： 100-111.
22	GOEL A， FERNANDO B， KELLER F， et al. Not all relations are equal： mining informative labels for scene graph generation［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 15575-15585.
23	LI L， CHEN L， HUANG Y， et al. The devil is in the labels： noisy label correction for robust scene graph generation［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 18847-18856.
24	LI W， ZHANG H， BAI Q， et al. PPDL： predicate probability distribution based loss for unbiased scene graph generation［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 19425-19434.
25	CHEN C， ZHAN Y， YU B， et al. Resistance training using prior bias： toward unbiased scene graph generation ［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2022： 212-220.
26	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 1. Cambridge： MIT Press， 2015： 91-99.
27	ZELLERS R， YASTSKAR M， THOMSON S， et al. Neural motifs： scene graph parsing with global context ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 5831-5840.

[1]	李林昊, 王逸泽, 李英双, 董永峰, 王振. 基于关系特征强化的全景场景图生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 584-593.
[2]	刘赏, 周煜炜, 代娆, 董林芳, 刘猛. 融合注意力和上下文信息的遥感图像小目标检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 292-300.
[3]	王朱佳, 余宙, 俞俊, 范建平. 基于多尺度时空Transformer的视频动态场景图生成模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 47-57.

基于关联信息增强与关系平衡的场景图生成方法

Scene graph generation method based on association information enhancement and relationship balance

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 27

相关文章 3

编辑推荐

Metrics