Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Multi-scale spatio-temporal decoupling for contrastive learning of skeleton action recognition
Xiaoxia LIU, Liqun KUANG, Song WANG, Shichao JIAO, Huiyan HAN, Fengguang XIONG
Journal of Computer Applications    2026, 46 (3): 767-774.   DOI: 10.11772/j.issn.1001-9081.2025030310
Abstract6)   HTML0)    PDF (1003KB)(9)       Save

Aiming at the problems of dynamic action modeling and multi-scale temporal fusion in skeleton action recognition, an efficient Multi-scale Spatio-Temporal Decoupled Contrastive Learning Framework (MSTDCLF) was proposed. Firstly, a Multi-scale Spatio-Temporal Feature enhancement module (MSTF) was designed to combine depth separable convolution and dilated convolution, so as to model short-term motion features and long-term behavior patterns simultaneously. Secondly, the semantic response between joints and feature channels was further strengthened by embedding the channel-spatial joint attention mechanism. Thirdly, a residual network with attention mechanism was used to solve the gradient decay problem of deep network structure. Finally, a Bidirectional Gated Spatio-temporal Context Modeling (BGSCM) was proposed, and a spatio-temporal enhancement branch was constructed on the basis of Bidirectional Long Short-Term Memory (BiLSTM) network, and the decoupled features were transmitted in joint topology and temporal axis through the gating mechanism, thereby suppressing noise interference and establishing complete action evolution dependency. Experimental results show that MSTDCLF has the accuracies of 87.5% (Cross-Subject (CS)) and 93.0% (Cross-View (CV)) on the NTU RGB+D 60 dataset, and the accuracies of 79.3% (CS) and 80.6% (crosS-Setup (SS)) on the NTU RGB+D 120 dataset, all of which are better than those of the suboptimal method SCD-Net (Spatiotemporal Clues Disentanglement Network). Ablation experiments verify the effectiveness of the multi-scale design and bidirectional gating mechanism, indicating that MSTDCLF can achieve efficient behavior representation in skeleton behavior recognition and improve recognition accuracy effectively.

Table and Figures | Reference | Related Articles | Metrics