《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (3): 683-689.DOI: 10.11772/j.issn.1001-9081.2023040413
所属专题: 人工智能
收稿日期:
2023-04-13
修回日期:
2023-07-04
接受日期:
2023-07-10
发布日期:
2023-12-04
出版日期:
2024-03-10
通讯作者:
王元龙
作者简介:
胡文博(1998—),男,山西运城人,硕士研究生,主要研究方向:自然语言处理、计算机视觉基金资助:
Yuanlong WANG(), Wenbo HU, Hu ZHANG
Received:
2023-04-13
Revised:
2023-07-04
Accepted:
2023-07-10
Online:
2023-12-04
Published:
2024-03-10
Contact:
Yuanlong WANG
About author:
HU Wenbo,born in 1998, M. S. candidate. His research interests include natural language processing, computer vision.Supported by:
摘要:
视觉关系检测(VRD)任务是在目标识别的基础上,进一步检测目标对象之间的关系,属于视觉理解和推理的关键技术。由于对象之间交互组合,容易造成对象间关系组合爆炸的问题,从而产生很多关联性较弱的实体对,导致后续的关系检测召回率较低。针对上述问题,提出知识引导的视觉关系检测模型。首先构建视觉知识,对常见的视觉关系检测数据集中的实体标签和关系标签进行数据分析与统计,得到实体和关系间交互共现频率作为视觉知识;然后利用所构建的视觉知识,优化实体对的组合流程,降低关联性较弱的实体对得分,提升关联性较强的实体对得分,进而按照实体对的得分排序并删除得分较低的实体对,对于实体之间的关系也同样采用知识引导的方式优化关系得分,从而提升模型的召回率。在公开数据集视觉基因库(VG)和VRD中验证所提模型的效果:在谓词分类任务中,与现有模型PE-Net(Prototype-based Embedding Network)相比,在VG数据集上,召回率Recall@50和Recall@100分别提高了1.84和1.14个百分点;在VRD数据集上,相较于Coacher,Recall@20、Recall@50和Recall@100分别提高了0.22、0.32和0.31个百分点。
中图分类号:
王元龙, 胡文博, 张虎. 知识引导的视觉关系检测模型[J]. 计算机应用, 2024, 44(3): 683-689.
Yuanlong WANG, Wenbo HU, Hu ZHANG. Knowledge-guided visual relationship detection model[J]. Journal of Computer Applications, 2024, 44(3): 683-689.
模型 | 谓词分类召回率 | 短语检测召回率 | 关系检测召回率 | ||||||
---|---|---|---|---|---|---|---|---|---|
R@20 | R@50 | R@100 | R@20 | R@50 | R@100 | R@20 | R@50 | R@100 | |
RLM | — | 67.93 | 68.20 | — | 26.60 | 33.92 | — | 16.96 | 21.17 |
ViP | — | — | — | — | 16.58 | 21.54 | — | 10.67 | 13.81 |
Motifs | 58.46 | 65.18 | 67.01 | 35.63 | 38.92 | 39.77 | 25.48 | 32.78 | 37.16 |
VCTree | 59.02 | 65.42 | 67.18 | 42.77 | 46.67 | 47.64 | 24.53 | 31.93 | 36.21 |
Transformer | 59.06 | 65.55 | 67.29 | 36.87 | 40.18 | 41.02 | 25.55 | 33.04 | 37.40 |
Coacher | 58.91 | 65.90 | 67.86 | 36.48 | 40.31 | 41.14 | 26.33 | 33.18 | 38.01 |
RU-Net | 61.60 | 67.70 | 69.60 | 37.20 | 39.80 | 40.90 | 22.90 | 31.30 | 34.80 |
NMP | — | 67.03 | 67.29 | — | — | — | — | — | — |
PE-Net | — | 64.90 | 67.20 | — | 39.40 | 40.70 | — | 30.70 | 35.20 |
本文模型 | 59.73 | 66.74 | 68.34 | 37.39 | 41.20 | 41.84 | 26.15 | 33.20 | 38.10 |
表1 VG数据集上不同模型性能对比 (%)
Tab. 1 Performance comparison of different models on VG dataset
模型 | 谓词分类召回率 | 短语检测召回率 | 关系检测召回率 | ||||||
---|---|---|---|---|---|---|---|---|---|
R@20 | R@50 | R@100 | R@20 | R@50 | R@100 | R@20 | R@50 | R@100 | |
RLM | — | 67.93 | 68.20 | — | 26.60 | 33.92 | — | 16.96 | 21.17 |
ViP | — | — | — | — | 16.58 | 21.54 | — | 10.67 | 13.81 |
Motifs | 58.46 | 65.18 | 67.01 | 35.63 | 38.92 | 39.77 | 25.48 | 32.78 | 37.16 |
VCTree | 59.02 | 65.42 | 67.18 | 42.77 | 46.67 | 47.64 | 24.53 | 31.93 | 36.21 |
Transformer | 59.06 | 65.55 | 67.29 | 36.87 | 40.18 | 41.02 | 25.55 | 33.04 | 37.40 |
Coacher | 58.91 | 65.90 | 67.86 | 36.48 | 40.31 | 41.14 | 26.33 | 33.18 | 38.01 |
RU-Net | 61.60 | 67.70 | 69.60 | 37.20 | 39.80 | 40.90 | 22.90 | 31.30 | 34.80 |
NMP | — | 67.03 | 67.29 | — | — | — | — | — | — |
PE-Net | — | 64.90 | 67.20 | — | 39.40 | 40.70 | — | 30.70 | 35.20 |
本文模型 | 59.73 | 66.74 | 68.34 | 37.39 | 41.20 | 41.84 | 26.15 | 33.20 | 38.10 |
模型 | R@20 | R@50 | R@100 |
---|---|---|---|
RLM | — | — | 52.19 |
Motifs | 47.70 | 51.84 | 52.28 |
VCTree | 48.19 | 52.23 | 52.71 |
Transformer | 42.30 | 46.74 | 47.76 |
Coacher | 48.09 | 52.08 | 52.79 |
NMP | — | 52.69 | 52.69 |
本文模型 | 48.31 | 52.40 | 53.10 |
表2 VRD数据集上不同模型的谓词分类召回率对比 (%)
Tab. 2 Comparison of predicate classification recall of different models on VRD dataset
模型 | R@20 | R@50 | R@100 |
---|---|---|---|
RLM | — | — | 52.19 |
Motifs | 47.70 | 51.84 | 52.28 |
VCTree | 48.19 | 52.23 | 52.71 |
Transformer | 42.30 | 46.74 | 47.76 |
Coacher | 48.09 | 52.08 | 52.79 |
NMP | — | 52.69 | 52.69 |
本文模型 | 48.31 | 52.40 | 53.10 |
模型 | zR@20 | zR@50 | zR@100 |
---|---|---|---|
Motifs | 13.05 | 19.03 | 21.98 |
VCTree | 10.35 | 13.63 | 15.64 |
Transformer | 11.04 | 13.27 | 15.51 |
Coacher | 13.42 | 19.31 | 22.22 |
本文模型 | 14.26 | 20.59 | 22.02 |
表3 VG数据集上的谓词分类零样本召回率对比 (%)
Tab. 3 Comparison of predicate classification zero-shot recall on VG dataset
模型 | zR@20 | zR@50 | zR@100 |
---|---|---|---|
Motifs | 13.05 | 19.03 | 21.98 |
VCTree | 10.35 | 13.63 | 15.64 |
Transformer | 11.04 | 13.27 | 15.51 |
Coacher | 13.42 | 19.31 | 22.22 |
本文模型 | 14.26 | 20.59 | 22.02 |
模型 | R@20 | R@50 | R@100 | zR@20 | zR@50 | zR@100 |
---|---|---|---|---|---|---|
BM | 57.91 | 64.90 | 66.86 | 13.42 | 19.31 | 22.22 |
BM+P | 58.56 | 65.10 | 67.56 | 13.07 | 18.91 | 21.97 |
BM+R | 57.87 | 64.95 | 66.96 | 13.58 | 19.80 | 22.48 |
BM+P+R | 59.73 | 65.74 | 67.34 | 14.26 | 20.59 | 22.02 |
表4 消融实验结果 (%)
Tab. 4 Ablation experiment results
模型 | R@20 | R@50 | R@100 | zR@20 | zR@50 | zR@100 |
---|---|---|---|---|---|---|
BM | 57.91 | 64.90 | 66.86 | 13.42 | 19.31 | 22.22 |
BM+P | 58.56 | 65.10 | 67.56 | 13.07 | 18.91 | 21.97 |
BM+R | 57.87 | 64.95 | 66.96 | 13.58 | 19.80 | 22.48 |
BM+P+R | 59.73 | 65.74 | 67.34 | 14.26 | 20.59 | 22.02 |
1 | LU C, KRISHNA R, BERNSTEIN M, et al. Visual relationship detection with language priors [C]// Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 852-869. 10.1007/978-3-319-46448-0_51 |
2 | 钟冠华,黄巍.基于多特征提取网络的视觉关系检测方法研究[J].电脑与电信, 2022(7): 67-70. 10.3969/j.issn.1008-6609.2022.7.gddnydx202207016 |
ZHONG G H, HUANG W. Research on visual relationship detection method based on multi-feature extraction network[J]. Computers & Telecommunications,2022(7):67-70. 10.3969/j.issn.1008-6609.2022.7.gddnydx202207016 | |
3 | 马立志.基于深度学习的视觉关系检测方法探讨[J].现代工业经济和信息化, 2021, 11(8): 84-86. 10.16525/j.cnki.14-1362/n.2021.08.33 |
MA L Z. Discussion on the visual relationship detection method based on deep learning [J]. Modern Industrial Economy and Informatization,2021,11(8):84-86. 10.16525/j.cnki.14-1362/n.2021.08.33 | |
4 | ZHOU H, ZHANG C, HU C. Visual relationship detection with relative location mining [C]// Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM, 2019: 30-38. 10.1145/3343031.3351024 |
5 | LI Y, OUYANG W, WANG X, et al. ViP-CNN: visual phrase guided convolutional neural network [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 7244-7253. 10.1109/cvpr.2017.766 |
6 | KRISHNA R, ZHU Y, GROTH O, et al. Visual genome: connecting language and vision using crowdsourced dense image annotations [J]. International Journal of Computer Vision, 2017, 123: 32-73. 10.1007/s11263-016-0981-7 |
7 | CHE W, FAN X, XIONG R, et al. Paragraph generation network with visual relationship detection [C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1435-1443. 10.1145/3240508.3240695 |
8 | XU D, ZHU Y, CHOY C B, et al. Scene graph generation by iterative message passing [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5410-5419. 10.1109/cvpr.2017.330 |
9 | DONG X, ZHU L, ZHANG D, et al. Fast parameter adaptation for few-shot image captioning and visual question answering [C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 54-62. 10.1145/3240508.3240527 |
10 | GAO L, ZENG P, SONG J, et al. Examine before you answer: multi-task learning with adaptive-attentions for multiple-choice VQA [C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1742-1750. 10.1145/3240508.3240687 |
11 | GALLEGUILLOS C, RABINOVICH A, BELONGIE S. Object categorization using co-occurrence, location and appearance [C]// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2008: 1-8. 10.1109/cvpr.2008.4587799 |
12 | DESAI C, RAMANAN D. Detecting actions, poses, and objects with relational phraselets [C]// Proceedings of the 12th European Conference on Computer Vision. Cham: Springer, 2012:158-172. 10.1007/978-3-642-33765-9_12 |
13 | SADEGHI M A, FARHADI A. Recognition using visual phrases[C]// Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2012: 1745-1752. 10.1109/cvpr.2011.5995711 |
14 | YIN G, SHENG L, LIU B, et al. Zoom-Net: mining deep feature interactions for visual relationship recognition [C]// Proceedings of the 15th European Conference on Computer Vision. Berlin: Springer,2018: 330-347. 10.1007/978-3-030-01219-9_20 |
15 | CUI Z, XU C, ZHENG W, et al. Context-dependent diffusion network for visual relationship detection [C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1475-1482. 10.1145/3240508.3240668 |
16 | SHARIFZADEH S, BAHARLOU S M, BERRENDORF M, et al. Improving visual relation detection using depth maps[C]// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021: 3597-3604. 10.1109/icpr48806.2021.9412945 |
17 | ZHANG H, KYAW Z, CHANG S-F, et al. Visual translation embedding network for visual relation detection [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5532-5540. 10.1109/cvpr.2017.331 |
18 | BORDES A, USUNIER N, GARCIA-DURÁN A, et al. Translating embeddings for modeling multi-relational data [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2013: 2787-2795. |
19 | WAN H, LUO Y, PENG B, et al. Representation learning for scene graph completion via jointly structural and visual embedding[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 949-956. 10.24963/ijcai.2018/132 |
20 | JI G, HE S, XU L, et al. Knowledge graph embedding via dynamic mapping matrix [C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2015: 687-696. 10.3115/v1/p15-1067 |
21 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal network [C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. Cambridge: MIT Press, 2015: 91-99. |
22 | KAN X, CUI H, YANG C. Zero-shot scene graph relation prediction through commonsense knowledge integration [C]// Proceedings of the 2021 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2021:466-482. 10.1007/978-3-030-86520-7_29 |
23 | TANG K, ZHANG H, WU B, et al. Learning to compose dynamic tree structures for visual contexts [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6619-6628. 10.1109/cvpr.2019.00678 |
24 | TANG K, NIU Y, HUANG J, et al. Unbiased scene graph generation from biased training [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3716-3725. 10.1109/cvpr42600.2020.00377 |
25 | ZELLERS R, YATSKAR M, THOMSON S, et al. Neural motifs: scene graph parsing with global context [C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5831-5840. 10.1109/cvpr.2018.00611 |
26 | LIN X, DING C, ZHANG J, et al. RU-Net: regularized unrolling network for scene graph generation [C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 19457-19466. 10.1109/cvpr52688.2022.01885 |
27 | ZHENG C, LYU X, GAO L, et al. Prototype-based embedding network for scene graph generation [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 22783-22792. 10.1109/cvpr52729.2023.02182 |
28 | HU Y, CHEN S, CHEN X, et al. Neural message passing for visual relationship detection [EB/OL]. [2022-08-08]. . |
[1] | 刘晋文 王磊 马博 董瑞 杨雅婷 艾合塔木江·艾合麦提 王欣乐. 基于弱监督模态语义增强的多模态有害信息检测方法 [J]. 《计算机应用》唯一官方网站, 0, (): 0-0. |
[2] | 王元龙 刘亭华 张虎. 基于跨模态对比学习的常识问答模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0. |
[3] | 薛天宇 李爱萍 段利国. 联合任务卸载和资源优化的车辆边缘计算方案[J]. 《计算机应用》唯一官方网站, 0, (): 0-0. |
[4] | 李昕 刘雯 廖集秀 杨宗驰. 面向机器理解的可视化交互信息重构方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0. |
[5] | 石志良, 廖诗旗, 甘梓博, 祝少博. 三维桡骨成角楔形截骨术前自动规划算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 588-594. |
[6] | 李文全, 毛伊敏, 彭新东. 基于犹豫模糊集的凝聚式层次聚类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3755-3763. |
[7] | 陈旭东, 钟恒, 皇甫洁, 吕高冲, 王成, 王德良, 童凯. 脑电信号情绪识别综述[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 323-332. |
[8] | 秦静, 马雪倩, 高福杰, 季长清, 汪祖民. 基于步态分析的帕金森病辅助诊断方法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1687-1695. |
[9] | 黄琼, 丁兆云. 基于粒子滤波的隧道火灾烟气速度估计方法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 986-990. |
[10] | 李晓寒, 王俊, 贾华丁, 萧刘. 基于多重注意力机制的图神经网络股市波动预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2265-2273. |
[11] | 李晓寒, 贾华丁, 程雪, 李太勇. 基于改进遗传算法和图神经网络的股市波动预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1624-1633. |
[12] | 秦静, 孙法莉, HUI Fang, 汪祖民, 高兵, 季长清. 可穿戴脑电图设备关键技术及其应用综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1029-1035. |
[13] | 刘晶, 董志红, 张喆语, 孙志刚, 季海鹏. 基于联邦增量学习的工业物联网数据共享方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1235-1243. |
[14] | 单芝慧, 韩萌, 韩强. 动态数据上的高效用模式挖掘综述[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 94-108. |
[15] | 黄晓祥, 胡咏梅, 吴丹, 任力杰. 基于变分自编码器的异常颈动脉早期识别和预测[J]. 计算机应用, 2021, 41(10): 3082-3088. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||