Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Human-object interaction detection algorithm by fusing local feature enhanced perception
Junyi LIN, Mingxuan CHEN, Yongbin GAO
Journal of Computer Applications    2025, 45 (11): 3713-3720.   DOI: 10.11772/j.issn.1001-9081.2024111662
Abstract33)   HTML2)    PDF (1324KB)(7)       Save

The core of Human-Object Interaction (HOI) detection is to identify humans and objects in the images and accurately classify their interactions, which is crucial for deepening scene understanding. However, existing algorithms struggle with complex interactions due to insufficient local information, leading to erroneous associations and difficulties in distinguishing fine-grained operations. To address this limitation, a Local Feature-enhanced Perceptual Module (LFPM) was designed to enhance the model's capability of capturing local feature information through the integration of local and non-local feature interactions. This module comprised three key components: the Downsampling Aggregation branch Module (DAM), which acquired low-frequency features through downsampling and aggregated non-local structural information; the Fine-Grained Feature Branch (FGFB) module, which performed parallel convolution operations to supplement the DAM's local information extraction; and the Multi-Scale Wavelet Convolution (MSWC) module, which further optimized output features in spatial and channel dimensions for more precise and comprehensive feature representations. Additionally, to address the limitations of Transformer in local spatial and channel feature mining, a spatial and channel Squeeze and Excitation (scSE) module was introduced. This module allocated attention across spatial and channel dimensions, enhancing the model's sensitivity to locally salient regions and effectively improving HOI detection accuracy. Finally, the LFPM, scSE, and Transformer architectures were integrated to form the Local Feature Enhancement Perception model (LFEP) framework. Experimental results show that, compared with the SQA (Strong guidance Query with self-selected Attention) algorithm, LFEP framework achieves 1.1 percentage points improvement in Average Precision on the V-COCO dataset, and 0.49 percentage points improvement in mean Average Precision (mAP) on the HICO-DET dataset. Ablation experimental results also validate the effectiveness of each module of LFEP.

Table and Figures | Reference | Related Articles | Metrics