Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (3): 683-689.DOI: 10.11772/j.issn.1001-9081.2023040413
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Yuanlong WANG(), Wenbo HU, Hu ZHANG
Received:
2023-04-13
Revised:
2023-07-04
Accepted:
2023-07-10
Online:
2023-12-04
Published:
2024-03-10
Contact:
Yuanlong WANG
About author:
HU Wenbo,born in 1998, M. S. candidate. His research interests include natural language processing, computer vision.Supported by:
通讯作者:
王元龙
作者简介:
胡文博(1998—),男,山西运城人,硕士研究生,主要研究方向:自然语言处理、计算机视觉基金资助:
CLC Number:
Yuanlong WANG, Wenbo HU, Hu ZHANG. Knowledge-guided visual relationship detection model[J]. Journal of Computer Applications, 2024, 44(3): 683-689.
王元龙, 胡文博, 张虎. 知识引导的视觉关系检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 683-689.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023040413
模型 | 谓词分类召回率 | 短语检测召回率 | 关系检测召回率 | ||||||
---|---|---|---|---|---|---|---|---|---|
R@20 | R@50 | R@100 | R@20 | R@50 | R@100 | R@20 | R@50 | R@100 | |
RLM | — | 67.93 | 68.20 | — | 26.60 | 33.92 | — | 16.96 | 21.17 |
ViP | — | — | — | — | 16.58 | 21.54 | — | 10.67 | 13.81 |
Motifs | 58.46 | 65.18 | 67.01 | 35.63 | 38.92 | 39.77 | 25.48 | 32.78 | 37.16 |
VCTree | 59.02 | 65.42 | 67.18 | 42.77 | 46.67 | 47.64 | 24.53 | 31.93 | 36.21 |
Transformer | 59.06 | 65.55 | 67.29 | 36.87 | 40.18 | 41.02 | 25.55 | 33.04 | 37.40 |
Coacher | 58.91 | 65.90 | 67.86 | 36.48 | 40.31 | 41.14 | 26.33 | 33.18 | 38.01 |
RU-Net | 61.60 | 67.70 | 69.60 | 37.20 | 39.80 | 40.90 | 22.90 | 31.30 | 34.80 |
NMP | — | 67.03 | 67.29 | — | — | — | — | — | — |
PE-Net | — | 64.90 | 67.20 | — | 39.40 | 40.70 | — | 30.70 | 35.20 |
本文模型 | 59.73 | 66.74 | 68.34 | 37.39 | 41.20 | 41.84 | 26.15 | 33.20 | 38.10 |
Tab. 1 Performance comparison of different models on VG dataset
模型 | 谓词分类召回率 | 短语检测召回率 | 关系检测召回率 | ||||||
---|---|---|---|---|---|---|---|---|---|
R@20 | R@50 | R@100 | R@20 | R@50 | R@100 | R@20 | R@50 | R@100 | |
RLM | — | 67.93 | 68.20 | — | 26.60 | 33.92 | — | 16.96 | 21.17 |
ViP | — | — | — | — | 16.58 | 21.54 | — | 10.67 | 13.81 |
Motifs | 58.46 | 65.18 | 67.01 | 35.63 | 38.92 | 39.77 | 25.48 | 32.78 | 37.16 |
VCTree | 59.02 | 65.42 | 67.18 | 42.77 | 46.67 | 47.64 | 24.53 | 31.93 | 36.21 |
Transformer | 59.06 | 65.55 | 67.29 | 36.87 | 40.18 | 41.02 | 25.55 | 33.04 | 37.40 |
Coacher | 58.91 | 65.90 | 67.86 | 36.48 | 40.31 | 41.14 | 26.33 | 33.18 | 38.01 |
RU-Net | 61.60 | 67.70 | 69.60 | 37.20 | 39.80 | 40.90 | 22.90 | 31.30 | 34.80 |
NMP | — | 67.03 | 67.29 | — | — | — | — | — | — |
PE-Net | — | 64.90 | 67.20 | — | 39.40 | 40.70 | — | 30.70 | 35.20 |
本文模型 | 59.73 | 66.74 | 68.34 | 37.39 | 41.20 | 41.84 | 26.15 | 33.20 | 38.10 |
模型 | R@20 | R@50 | R@100 |
---|---|---|---|
RLM | — | — | 52.19 |
Motifs | 47.70 | 51.84 | 52.28 |
VCTree | 48.19 | 52.23 | 52.71 |
Transformer | 42.30 | 46.74 | 47.76 |
Coacher | 48.09 | 52.08 | 52.79 |
NMP | — | 52.69 | 52.69 |
本文模型 | 48.31 | 52.40 | 53.10 |
Tab. 2 Comparison of predicate classification recall of different models on VRD dataset
模型 | R@20 | R@50 | R@100 |
---|---|---|---|
RLM | — | — | 52.19 |
Motifs | 47.70 | 51.84 | 52.28 |
VCTree | 48.19 | 52.23 | 52.71 |
Transformer | 42.30 | 46.74 | 47.76 |
Coacher | 48.09 | 52.08 | 52.79 |
NMP | — | 52.69 | 52.69 |
本文模型 | 48.31 | 52.40 | 53.10 |
模型 | zR@20 | zR@50 | zR@100 |
---|---|---|---|
Motifs | 13.05 | 19.03 | 21.98 |
VCTree | 10.35 | 13.63 | 15.64 |
Transformer | 11.04 | 13.27 | 15.51 |
Coacher | 13.42 | 19.31 | 22.22 |
本文模型 | 14.26 | 20.59 | 22.02 |
Tab. 3 Comparison of predicate classification zero-shot recall on VG dataset
模型 | zR@20 | zR@50 | zR@100 |
---|---|---|---|
Motifs | 13.05 | 19.03 | 21.98 |
VCTree | 10.35 | 13.63 | 15.64 |
Transformer | 11.04 | 13.27 | 15.51 |
Coacher | 13.42 | 19.31 | 22.22 |
本文模型 | 14.26 | 20.59 | 22.02 |
模型 | R@20 | R@50 | R@100 | zR@20 | zR@50 | zR@100 |
---|---|---|---|---|---|---|
BM | 57.91 | 64.90 | 66.86 | 13.42 | 19.31 | 22.22 |
BM+P | 58.56 | 65.10 | 67.56 | 13.07 | 18.91 | 21.97 |
BM+R | 57.87 | 64.95 | 66.96 | 13.58 | 19.80 | 22.48 |
BM+P+R | 59.73 | 65.74 | 67.34 | 14.26 | 20.59 | 22.02 |
Tab. 4 Ablation experiment results
模型 | R@20 | R@50 | R@100 | zR@20 | zR@50 | zR@100 |
---|---|---|---|---|---|---|
BM | 57.91 | 64.90 | 66.86 | 13.42 | 19.31 | 22.22 |
BM+P | 58.56 | 65.10 | 67.56 | 13.07 | 18.91 | 21.97 |
BM+R | 57.87 | 64.95 | 66.96 | 13.58 | 19.80 | 22.48 |
BM+P+R | 59.73 | 65.74 | 67.34 | 14.26 | 20.59 | 22.02 |
1 | LU C, KRISHNA R, BERNSTEIN M, et al. Visual relationship detection with language priors [C]// Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 852-869. 10.1007/978-3-319-46448-0_51 |
2 | 钟冠华,黄巍.基于多特征提取网络的视觉关系检测方法研究[J].电脑与电信, 2022(7): 67-70. 10.3969/j.issn.1008-6609.2022.7.gddnydx202207016 |
ZHONG G H, HUANG W. Research on visual relationship detection method based on multi-feature extraction network[J]. Computers & Telecommunications,2022(7):67-70. 10.3969/j.issn.1008-6609.2022.7.gddnydx202207016 | |
3 | 马立志.基于深度学习的视觉关系检测方法探讨[J].现代工业经济和信息化, 2021, 11(8): 84-86. 10.16525/j.cnki.14-1362/n.2021.08.33 |
MA L Z. Discussion on the visual relationship detection method based on deep learning [J]. Modern Industrial Economy and Informatization,2021,11(8):84-86. 10.16525/j.cnki.14-1362/n.2021.08.33 | |
4 | ZHOU H, ZHANG C, HU C. Visual relationship detection with relative location mining [C]// Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM, 2019: 30-38. 10.1145/3343031.3351024 |
5 | LI Y, OUYANG W, WANG X, et al. ViP-CNN: visual phrase guided convolutional neural network [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 7244-7253. 10.1109/cvpr.2017.766 |
6 | KRISHNA R, ZHU Y, GROTH O, et al. Visual genome: connecting language and vision using crowdsourced dense image annotations [J]. International Journal of Computer Vision, 2017, 123: 32-73. 10.1007/s11263-016-0981-7 |
7 | CHE W, FAN X, XIONG R, et al. Paragraph generation network with visual relationship detection [C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1435-1443. 10.1145/3240508.3240695 |
8 | XU D, ZHU Y, CHOY C B, et al. Scene graph generation by iterative message passing [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5410-5419. 10.1109/cvpr.2017.330 |
9 | DONG X, ZHU L, ZHANG D, et al. Fast parameter adaptation for few-shot image captioning and visual question answering [C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 54-62. 10.1145/3240508.3240527 |
10 | GAO L, ZENG P, SONG J, et al. Examine before you answer: multi-task learning with adaptive-attentions for multiple-choice VQA [C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1742-1750. 10.1145/3240508.3240687 |
11 | GALLEGUILLOS C, RABINOVICH A, BELONGIE S. Object categorization using co-occurrence, location and appearance [C]// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2008: 1-8. 10.1109/cvpr.2008.4587799 |
12 | DESAI C, RAMANAN D. Detecting actions, poses, and objects with relational phraselets [C]// Proceedings of the 12th European Conference on Computer Vision. Cham: Springer, 2012:158-172. 10.1007/978-3-642-33765-9_12 |
13 | SADEGHI M A, FARHADI A. Recognition using visual phrases[C]// Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2012: 1745-1752. 10.1109/cvpr.2011.5995711 |
14 | YIN G, SHENG L, LIU B, et al. Zoom-Net: mining deep feature interactions for visual relationship recognition [C]// Proceedings of the 15th European Conference on Computer Vision. Berlin: Springer,2018: 330-347. 10.1007/978-3-030-01219-9_20 |
15 | CUI Z, XU C, ZHENG W, et al. Context-dependent diffusion network for visual relationship detection [C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1475-1482. 10.1145/3240508.3240668 |
16 | SHARIFZADEH S, BAHARLOU S M, BERRENDORF M, et al. Improving visual relation detection using depth maps[C]// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021: 3597-3604. 10.1109/icpr48806.2021.9412945 |
17 | ZHANG H, KYAW Z, CHANG S-F, et al. Visual translation embedding network for visual relation detection [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5532-5540. 10.1109/cvpr.2017.331 |
18 | BORDES A, USUNIER N, GARCIA-DURÁN A, et al. Translating embeddings for modeling multi-relational data [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2013: 2787-2795. |
19 | WAN H, LUO Y, PENG B, et al. Representation learning for scene graph completion via jointly structural and visual embedding[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 949-956. 10.24963/ijcai.2018/132 |
20 | JI G, HE S, XU L, et al. Knowledge graph embedding via dynamic mapping matrix [C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2015: 687-696. 10.3115/v1/p15-1067 |
21 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal network [C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. Cambridge: MIT Press, 2015: 91-99. |
22 | KAN X, CUI H, YANG C. Zero-shot scene graph relation prediction through commonsense knowledge integration [C]// Proceedings of the 2021 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2021:466-482. 10.1007/978-3-030-86520-7_29 |
23 | TANG K, ZHANG H, WU B, et al. Learning to compose dynamic tree structures for visual contexts [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6619-6628. 10.1109/cvpr.2019.00678 |
24 | TANG K, NIU Y, HUANG J, et al. Unbiased scene graph generation from biased training [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3716-3725. 10.1109/cvpr42600.2020.00377 |
25 | ZELLERS R, YATSKAR M, THOMSON S, et al. Neural motifs: scene graph parsing with global context [C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5831-5840. 10.1109/cvpr.2018.00611 |
26 | LIN X, DING C, ZHANG J, et al. RU-Net: regularized unrolling network for scene graph generation [C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 19457-19466. 10.1109/cvpr52688.2022.01885 |
27 | ZHENG C, LYU X, GAO L, et al. Prototype-based embedding network for scene graph generation [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 22783-22792. 10.1109/cvpr52729.2023.02182 |
28 | HU Y, CHEN S, CHEN X, et al. Neural message passing for visual relationship detection [EB/OL]. [2022-08-08]. . |
[1] | . Multimodal harmful content detection method based on weakly supervised modality semantic enhancement [J]. Journal of Computer Applications, 0, (): 0-0. |
[2] | . Commonsense question-answering model based on cross-modal contrastive learning [J]. Journal of Computer Applications, 0, (): 0-0. |
[3] | . Vehicular edge computing scheme with task offloading and resource optimization [J]. Journal of Computer Applications, 0, (): 0-0. |
[4] | . Visual interactive information reconstruction method for machine understanding [J]. Journal of Computer Applications, 0, (): 0-0. |
[5] | Zhiliang SHI, Shiqi LIAO, Zibo GAN, Shaobo ZHU. Automatic preoperative planning algorithm for three-dimensional wedge osteotomy of radius [J]. Journal of Computer Applications, 2024, 44(2): 588-594. |
[6] | Wenquan LI, Yimin MAO, Xindong PENG. Agglomerative hierarchical clustering algorithm based on hesitant fuzzy set [J]. Journal of Computer Applications, 2023, 43(12): 3755-3763. |
[7] | Xudong CHEN, Heng ZHONG, Jie HUANGFU, Gaochong LYU, Cheng WANG, Deliang WANG, Kai TONG. Review of emotion recognition of EEG signals [J]. Journal of Computer Applications, 2023, 43(S1): 323-332. |
[8] | Jing QIN, Xueqian MA, Fujie GAO, Changqing JI, Zumin WANG. Survey of Parkinson’s disease auxiliary diagnosis methods based on gait analysis [J]. Journal of Computer Applications, 2023, 43(6): 1687-1695. |
[9] | Qiong HUANG, Zhaoyun DING. Estimation method of tunnel fire smoke velocity based on particle filtering [J]. Journal of Computer Applications, 2023, 43(3): 986-990. |
[10] | Xiaohan LI, Jun WANG, Huading JIA, Liu XIAO. Stock market volatility prediction method based on graph neural network with multi-attention mechanism [J]. Journal of Computer Applications, 2022, 42(7): 2265-2273. |
[11] | Xiaohan LI, Huading JIA, Xue CHENG, Taiyong LI. Stock market volatility prediction method based on improved genetic algorithm and graph neural network [J]. Journal of Computer Applications, 2022, 42(5): 1624-1633. |
[12] | Jing QIN, Fali SUN, Fang HUI, Zumin WANG, Bing GAO, Changqing JI. Review of key technology and application of wearable electroencephalogram device [J]. Journal of Computer Applications, 2022, 42(4): 1029-1035. |
[13] | Jing LIU, Zhihong DONG, Zheyu ZHANG, Zhigang SUN, Haipeng JI. Data sharing method of industrial internet of things based on federal incremental learning [J]. Journal of Computer Applications, 2022, 42(4): 1235-1243. |
[14] | Zhihui SHAN, Meng HAN, Qiang HAN. Survey of high utility pattern mining on dynamic data [J]. Journal of Computer Applications, 2022, 42(1): 94-108. |
[15] | HUANG Xiaoxiang, HU Yongmei, WU Dan, REN Lijie. Early identification and prediction of abnormal carotid arteries based on variational autoencoder [J]. Journal of Computer Applications, 2021, 41(10): 3082-3088. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||