Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (11): 3713-3720.DOI: 10.11772/j.issn.1001-9081.2024111662
• Multimedia computing and computer simulation • Previous Articles
Junyi LIN, Mingxuan CHEN(
), Yongbin GAO
Received:2024-11-22
Revised:2025-04-09
Accepted:2025-04-17
Online:2025-04-22
Published:2025-11-10
Contact:
Mingxuan CHEN
About author:LIN Junyi, born in 1999, M. S. candidate. His research interests include human-object interaction detection.Supported by:通讯作者:
陈明轩
作者简介:林峻屹(1999—),男,山东烟台人,硕士研究生,主要研究方向:人-物交互检测基金资助:CLC Number:
Junyi LIN, Mingxuan CHEN, Yongbin GAO. Human-object interaction detection algorithm by fusing local feature enhanced perception[J]. Journal of Computer Applications, 2025, 45(11): 3713-3720.
林峻屹, 陈明轩, 高永彬. 融合局部特征增强感知的人-物交互检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(11): 3713-3720.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024111662
| 方法 | 默认 | 已知类 | |||||
|---|---|---|---|---|---|---|---|
| 完整类 | 稀有类 | 非稀有类 | 完整类 | 稀有类 | 非稀有类 | ||
| 一阶段方法 | UnionDet[ | 17.58 | 11.72 | 19.33 | 19.76 | 14.68 | 21.27 |
| IPNet[ | 19.56 | 12.79 | 21.58 | 22.05 | 15.77 | 23.92 | |
| PPDM[ | 21.94 | 13.97 | 24.32 | 24.81 | 17.09 | 27.12 | |
| AS-Net[ | 24.40 | 22.39 | 25.01 | 27.41 | 25.44 | 28.00 | |
| QPIC[ | 29.07 | 21.85 | 31.23 | 31.68 | 24.14 | 33.93 | |
| CDT[ | — | — | — | ||||
| SQAB[ | 30.82 | 24.92 | 32.58 | 33.58 | 27.19 | 35.49 | |
| SQA[ | 31.99 | 25.88 | 32.62 | 35.12 | 32.74 | — | |
| 两阶段方法 | TIN[ | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
| DRG[ | 19.26 | 17.74 | 19.71 | 23.40 | 21.75 | 23.89 | |
| ACP[ | 20.59 | 15.92 | 21.98 | — | — | — | |
| DJRN[ | 21.34 | 18.53 | 22.18 | 23.69 | 20.64 | 24.60 | |
| IDN[37] | 23.36 | 22.47 | 23.63 | 26.43 | 25.01 | 26.85 | |
| FCL[ | 25.27 | 20.57 | 26.67 | 27.71 | 22.34 | 28.93 | |
| TMHOI[ | 26.95 | 21.28 | 28.56 | — | — | — | |
| OCN[ | — | — | — | ||||
| LFEP | 32.48 | 27.12 | 34.05 | 35.09 | 29.58 | 36.16 | |
Tab. 1 mAp comparison of different methods on HICO-DET test set
| 方法 | 默认 | 已知类 | |||||
|---|---|---|---|---|---|---|---|
| 完整类 | 稀有类 | 非稀有类 | 完整类 | 稀有类 | 非稀有类 | ||
| 一阶段方法 | UnionDet[ | 17.58 | 11.72 | 19.33 | 19.76 | 14.68 | 21.27 |
| IPNet[ | 19.56 | 12.79 | 21.58 | 22.05 | 15.77 | 23.92 | |
| PPDM[ | 21.94 | 13.97 | 24.32 | 24.81 | 17.09 | 27.12 | |
| AS-Net[ | 24.40 | 22.39 | 25.01 | 27.41 | 25.44 | 28.00 | |
| QPIC[ | 29.07 | 21.85 | 31.23 | 31.68 | 24.14 | 33.93 | |
| CDT[ | — | — | — | ||||
| SQAB[ | 30.82 | 24.92 | 32.58 | 33.58 | 27.19 | 35.49 | |
| SQA[ | 31.99 | 25.88 | 32.62 | 35.12 | 32.74 | — | |
| 两阶段方法 | TIN[ | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
| DRG[ | 19.26 | 17.74 | 19.71 | 23.40 | 21.75 | 23.89 | |
| ACP[ | 20.59 | 15.92 | 21.98 | — | — | — | |
| DJRN[ | 21.34 | 18.53 | 22.18 | 23.69 | 20.64 | 24.60 | |
| IDN[37] | 23.36 | 22.47 | 23.63 | 26.43 | 25.01 | 26.85 | |
| FCL[ | 25.27 | 20.57 | 26.67 | 27.71 | 22.34 | 28.93 | |
| TMHOI[ | 26.95 | 21.28 | 28.56 | — | — | — | |
| OCN[ | — | — | — | ||||
| LFEP | 32.48 | 27.12 | 34.05 | 35.09 | 29.58 | 36.16 | |
| 方法 | 方法 | ||||
|---|---|---|---|---|---|
| UnionDet[ | 47.5 | 56.2 | HOTR[ | 55.2 | 64.4 |
| TIN[ | 47.8 | 54.2 | QPIC[ | 58.8 | 61.0 |
| IPNet[ | 51.0 | — | CDT[ | 61.43 | — |
| DRG[ | 51.0 | — | SQAB[ | — | |
| FCL[ | 52.4 | — | OCN[ | — | |
| ACP[ | 52.9 | — | SQA[ | — | |
| IDN[37] | 53.3 | 60.3 | LFEP | 66.5 | 68.8 |
| AS-Net[ | 53.9 | — |
Tab. 2 Comparison of effectiveness of different methods on V-COCO test set
| 方法 | 方法 | ||||
|---|---|---|---|---|---|
| UnionDet[ | 47.5 | 56.2 | HOTR[ | 55.2 | 64.4 |
| TIN[ | 47.8 | 54.2 | QPIC[ | 58.8 | 61.0 |
| IPNet[ | 51.0 | — | CDT[ | 61.43 | — |
| DRG[ | 51.0 | — | SQAB[ | — | |
| FCL[ | 52.4 | — | OCN[ | — | |
| ACP[ | 52.9 | — | SQA[ | — | |
| IDN[37] | 53.3 | 60.3 | LFEP | 66.5 | 68.8 |
| AS-Net[ | 53.9 | — |
| 方法 | mAP/% | |
|---|---|---|
| 默认 | 已知类 | |
| BaseLine | 30.75 | 33.12 |
| BaseLine+LFPM | 31.61 | 34.26 |
| BaseLine+MSWC | 31.20 | 33.60 |
| BaseLine+LFPM+MSWC | 32.08 | 34.63 |
| BaseLine+scSE | 31.16 | 33.62 |
| LFEP | 32.48 | 35.09 |
Tab. 3 Ablation experiment results on HICO-DET dataset for each module
| 方法 | mAP/% | |
|---|---|---|
| 默认 | 已知类 | |
| BaseLine | 30.75 | 33.12 |
| BaseLine+LFPM | 31.61 | 34.26 |
| BaseLine+MSWC | 31.20 | 33.60 |
| BaseLine+LFPM+MSWC | 32.08 | 34.63 |
| BaseLine+scSE | 31.16 | 33.62 |
| LFEP | 32.48 | 35.09 |
| [1] | SADEGHI M A, FARHADI A. Recognition using visual phrases[C]// Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2011: 1745-1752. |
| [2] | IFTEKHAR A S M, KUMAR S, McEVER R A, et al. GTNet: guided transformer network for detecting human-object interactions[C]// Proceedings of the SPIE 12527, Pattern Recognition and Tracking XXXIV. Bellingham, WA: SPIE, 2023: No.125270Q. |
| [3] | CAO Y, TANG Q, YANG F, et al. Re-mine, learn and reason: exploring the cross-modal semantic correlations for language-guided HOI detection[C]// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 23435-23446. |
| [4] | ZHONG X, DING C, QU X, et al. Polysemy deciphering network for robust human-object interaction detection[J]. International Journal of Computer Vision, 2021, 129(6): 1910-1929. |
| [5] | YANG Y, ZHUANG Y, PAN Y. Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies[J]. Frontiers of Information Technology and Electronic Engineering, 2021, 22(12): 1551-1558. |
| [6] | ZHANG A, LIAO Y, LIU S, et al. Mining the benefits of two-stage and one-stage HOI detection[C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 17209-17220. |
| [7] | 龚勋,张志莹,刘璐,等.人物交互检测研究进展综述[J].西南交通大学学报,2022,57(4):693-704. |
| GONG X, ZHANG Z Y, LIU L, et al. A survey of human-object interaction detection[J]. Journal of Southwest Jiaotong University, 2022, 57(4): 693-704. | |
| [8] | GUPTA S, MALIK J. Visual semantic role labeling[EB/OL]. [2024-09-20].. |
| [9] | CHAO Y W, LIU Y, LIU X, et al. Learning to detect human-object interactions[C]// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2018: 381-389. |
| [10] | ZHENG S, XU B, JIN Q. Open-category human-object interaction pre-training via language modeling framework[C]// Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 19392-19402. |
| [11] | ZOU C, WANG B, HU Y, et al. End-to-end human object interaction detection with HOI Transformer[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 11820-11829. |
| [12] | LIAO Y, LIU S, WANG F, et al. PPDM: parallel point detection and matching for real-time human-object interaction detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 479-487. |
| [13] | KIM B, CHOI T, KANG J, et al. UnionDet: union-level detector towards real-time human-object interaction detection[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12360. Cham: Springer, 2020: 498-514. |
| [14] | ZOU C, WANG B, HU Y, et al. Cascaded decoding network for HOI detection[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 11825-11834. |
| [15] | CHEN M, LIAO Y, LIU S, et al. Reformulating HOI detection as adaptive set prediction[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 9000-9009. |
| [16] | ZHONG X, DING C, QU X, et al. Polysemy deciphering network for human-object interaction detection[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12365. Cham: Springer, 2020: 69-85. |
| [17] | GKIOXARI G, GIRSHICK R, DOLLÁR P, et al. Detecting and recognizing human-object interactions[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8359-8367. |
| [18] | ZHANG Y, PAN Y, YAO T, et al. Exploring structure-aware Transformer over interaction proposals for human-object interaction detection[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 19526-19535. |
| [19] | ZHANG F Z, CAMPBELL D, GOULD S. Efficient two-stage detection of human-object interactions with a novel Unary-Pairwise Transformer[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 20072-20080. |
| [20] | ZHOU D, LIU Z, WANG J, et al. Human-object interaction detection via Disentangled Transformer[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 19546-19555. |
| [21] | GAO C, XU J, ZOU Y, et al. DRG: dual relation graph for human-object interaction detection[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12357. Cham: Springer, 2020: 696-712. |
| [22] | ZHANG F Z, CAMPBELL D, GOULD S. Spatially conditioned graphs for detecting human-object interactions[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 13299-13307. |
| [23] | PARK N, KIM S. How do Vision Transformers work?[EB/OL]. [2025-01-13].. |
| [24] | HENDRYCKS D, GIMPEL K. Gaussian Error Linear Units (GELUs)[EB/OL]. [2024-11-09].. |
| [25] | KUHN H W. The Hungarian method for the assignment problem[M]// JÜNGERM, LIEBLINGT M, NADDEFD, alet. 50 years of integer programming 1958 — 2008. Berlin: Springer, 2010: 29-47. |
| [26] | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1. Cambridge: MIT Press, 2015: 91-99. |
| [27] | REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 658-666. |
| [28] | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. |
| [29] | WANG T, YANG T, DANELLJAN M, et al. Learning human-object interaction detection using interaction points[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 4115-4124. |
| [30] | TAMURA M, OHASHI H, YOSHINAGA T. QPIC: query-based pairwise human-object interaction detection with image-wide contextual information[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10405-10414. |
| [31] | ZONG D, SU S. Zero-shot human-object interaction detection via similarity propagation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(12): 17805-17816. |
| [32] | LI J, LAI H, GAO G, et al. SQAB: specific query anchor boxes for human-object interaction detection[J]. Displays, 2023, 80: No.102570. |
| [33] | ZHANG F, SHENG L, GUO B, et al. SQA: strong guidance query with self-selected attention for human-object interaction detection[C]// Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2023: 1-5. |
| [34] | LI Y L, ZHOU S, HUANG X, et al. Transferable interactiveness knowledge for human-object interaction detection[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3580-3589. |
| [35] | KIM D J, SUN X, CHOI J, et al. Detecting human-object interactions with action co-occurrence priors[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12366. Cham: Springer, 2020: 718-736. |
| [36] | LI Y L, LIU X, LU H, et al. Detailed 2D-3D joint representation for human-object interaction[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10163-10172. |
| 37 LI Y L, LIU X, WU X, et al. HOI analysis: integrating and decomposing human-object interaction[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 5011-5022. | |
| [38] | HOU Z, YU B, QIAO Y, et al. Detecting human-object interaction via fabricated compositional learning[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14641-14650. |
| [39] | ZHU L, LAN Q, VELASQUEZ A, et al. TMHOI: translational model for human-object interaction detection[EB/OL]. [2024-06-20].. |
| [40] | YUAN H, WANG M, NI D, et al. Detecting human-object interactions with object-guided cross-modal calibrated semantics[C]// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2022: 3206-3214. |
| [41] | KIM B, LEE J, KANG J, et al. HOTR: end-to-end human-object interaction detection with transformers[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 74-83. |
| [1] | Wei ZONG, Yue ZHAO, Yin LI, Xiaona XU. Review of optimization methods for end-to-end speech-to-speech translation [J]. Journal of Computer Applications, 2025, 45(5): 1363-1371. |
| [2] | Dongmei XIE, Xinye BIAN, Lianfei YU, Wenbo LIU, Ziling WANG, Zhijian QU, Jiafeng YU. DeepsORF: coding sORFs prediction method based on graph coding with improved flow attention [J]. Journal of Computer Applications, 2025, 45(2): 546-555. |
| [3] | Ming JIANG, Linqin WANG, Hua LAI, Shengxiang GAO. End-to-end Vietnamese text normalization method based on editing constraints [J]. Journal of Computer Applications, 2025, 45(2): 362-370. |
| [4] | Qiang FU, Zhenping XU, Wenxing SHENG, Qing YE. End-to-end Chinese speech recognition method with byte-level byte pair encoding [J]. Journal of Computer Applications, 2025, 45(1): 318-324. |
| [5] | Cong LIU, Genshun WAN, Jianqing GAO, Zhonghua FU. End-to-end speech recognition method based on prosodic features [J]. Journal of Computer Applications, 2023, 43(2): 380-384. |
| [6] | Lei YANG, Hongdong ZHAO, Kuaikuai YU. End-to-end speech emotion recognition based on multi-head attention [J]. Journal of Computer Applications, 2022, 42(6): 1869-1875. |
| [7] | GUO Shuai, SU Yang. Encrypted traffic classification method based on data stream [J]. Journal of Computer Applications, 2021, 41(5): 1386-1391. |
| [8] | WU Saisai, LIANG Xiaohe, XIE Nengfu, ZHOU Ailian, HAO Xinning. Annotation method for joint extraction of domain-oriented entities and relations [J]. Journal of Computer Applications, 2021, 41(10): 2858-2863. |
| [9] | HU Xuemin, TONG Xiuchi, GUO Lin, ZHANG Ruohan, KONG Li. End-to-end autonomous driving model based on deep visual attention neural network [J]. Journal of Computer Applications, 2020, 40(7): 1926-1931. |
| [10] | CHEN Xiukai, LU Zhihua, ZHOU Yu. Speech separation algorithm based on convolutional encoder decoder and gated recurrent unit [J]. Journal of Computer Applications, 2020, 40(7): 2137-2141. |
| [11] | JIA Yongchao, HE Xiaowei, ZHENG Zhonglong. Object tracking algorithm combining re-detection mechanism and convolutional regression network [J]. Journal of Computer Applications, 2019, 39(8): 2247-2251. |
| [12] | QIU Zeyu, QU Dan, ZHANG Lianhai. End-to-end speech synthesis based on WaveNet [J]. Journal of Computer Applications, 2019, 39(5): 1325-1329. |
| [13] | PAN Peike, WANG Yan, LUO Yong, ZHOU Jiliu. Automatic segmentation of nasopharyngeal neoplasm in MR image based on U-net model [J]. Journal of Computer Applications, 2019, 39(4): 1183-1188. |
| [14] | WANG Kang, DONG Yuanfei. Angular interval embedding based end-to-end voiceprint recognition model [J]. Journal of Computer Applications, 2019, 39(10): 2937-2941. |
| [15] | YAO Yu, RYAD Chellali. End-to-end Chinese speech recognition system using bidirectional long short-term memory networks and weighted finite-state transducers [J]. Journal of Computer Applications, 2018, 38(9): 2495-2499. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||