Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (3): 767-774.DOI: 10.11772/j.issn.1001-9081.2025030310
• Artificial intelligence • Previous Articles Next Articles
Xiaoxia LIU1,2,3, Liqun KUANG1,2,3(
), Song WANG1,2,3, Shichao JIAO1,2,3, Huiyan HAN1,2,3, Fengguang XIONG1,2,3
Received:2025-03-27
Revised:2025-04-27
Accepted:2025-04-28
Online:2025-05-09
Published:2026-03-10
Contact:
Liqun KUANG
About author:LIU Xiaoxia, born in 2000, M. S. candidate. Her research interests include human behavior recognition.Supported by:
刘晓霞1,2,3, 况立群1,2,3(
), 王松1,2,3, 焦世超1,2,3, 韩慧妍1,2,3, 熊风光1,2,3
通讯作者:
况立群
作者简介:刘晓霞(2000—),女,山西临汾人,硕士研究生,CCF会员,主要研究方向:人体行为识别基金资助:CLC Number:
Xiaoxia LIU, Liqun KUANG, Song WANG, Shichao JIAO, Huiyan HAN, Fengguang XIONG. Multi-scale spatio-temporal decoupling for contrastive learning of skeleton action recognition[J]. Journal of Computer Applications, 2026, 46(3): 767-774.
刘晓霞, 况立群, 王松, 焦世超, 韩慧妍, 熊风光. 多尺度时空解耦的骨架行为识别对比学习[J]. 《计算机应用》唯一官方网站, 2026, 46(3): 767-774.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025030310
| 方法 | 主干网络 | NTU 60 | NTU 120 | ||
|---|---|---|---|---|---|
| CS | CV | CS | SS | ||
| CSTCN[ | GRU | 85.8 | 92.0 | 77.5 | 78.5 |
| HaLP[ | GRU | 79.7 | 86.8 | 71.1 | 72.2 |
| HiCo[ | GRU | 82.6 | 90.8 | 75.9 | 77.3 |
| ActCLR[ | GCN | 84.3 | 88.8 | 74.3 | 75.7 |
| SkeAttnCLR[ | GCN | 82.0 | 86.5 | 77.1 | 80.0 |
| HYSP[ | GCN | 79.1 | 85.2 | 64.5 | 67.3 |
| HiCLR[ | GCN | 80.4 | 85.5 | 70.0 | 70.4 |
| ViA[ | GCN | 78.1 | 85.8 | 69.2 | 66.9 |
| UmURL[ | Transformer | 84.4 | 91.4 | 75.9 | 77.2 |
| SCD-Net[ | GCN+Transformer | 86.4 | 91.3 | 76.7 | 79.6 |
| MSTDCLF | GCN+BiLSTM | 87.5 | 93.0 | 79.3 | 80.6 |
Tab. 1 Comparison of accuracy of different methods on NTU 60 and NTU 120 datasets
| 方法 | 主干网络 | NTU 60 | NTU 120 | ||
|---|---|---|---|---|---|
| CS | CV | CS | SS | ||
| CSTCN[ | GRU | 85.8 | 92.0 | 77.5 | 78.5 |
| HaLP[ | GRU | 79.7 | 86.8 | 71.1 | 72.2 |
| HiCo[ | GRU | 82.6 | 90.8 | 75.9 | 77.3 |
| ActCLR[ | GCN | 84.3 | 88.8 | 74.3 | 75.7 |
| SkeAttnCLR[ | GCN | 82.0 | 86.5 | 77.1 | 80.0 |
| HYSP[ | GCN | 79.1 | 85.2 | 64.5 | 67.3 |
| HiCLR[ | GCN | 80.4 | 85.5 | 70.0 | 70.4 |
| ViA[ | GCN | 78.1 | 85.8 | 69.2 | 66.9 |
| UmURL[ | Transformer | 84.4 | 91.4 | 75.9 | 77.2 |
| SCD-Net[ | GCN+Transformer | 86.4 | 91.3 | 76.7 | 79.6 |
| MSTDCLF | GCN+BiLSTM | 87.5 | 93.0 | 79.3 | 80.6 |
| 编码块 | NTU 60 | NTU 120 | ||
|---|---|---|---|---|
| CS | CV | CS | SS | |
| Base | 86.4 | 91.3 | 76.7 | 79.6 |
| Base+MSTF | 86.6 | 92.2 | 78.1 | 80.1 |
| Base+BGSCM | 87.3 | 92.3 | 79.0 | 80.2 |
| Base+MSTF+BGSCM | 87.5 | 93.0 | 79.3 | 80.6 |
Tab. 2 Ablation experimental results on effectiveness of encoding blocks
| 编码块 | NTU 60 | NTU 120 | ||
|---|---|---|---|---|
| CS | CV | CS | SS | |
| Base | 86.4 | 91.3 | 76.7 | 79.6 |
| Base+MSTF | 86.6 | 92.2 | 78.1 | 80.1 |
| Base+BGSCM | 87.3 | 92.3 | 79.0 | 80.2 |
| Base+MSTF+BGSCM | 87.5 | 93.0 | 79.3 | 80.6 |
| [1] | 孟月波,陈廷廷,杨逍. 卷积时间注意力与多尺度信息学习的人体行为检测方法[J/OL]. 计算机工程与应用 [2025-02-24]. . |
| MENG Y B, CHEN T T, YANG X. Convolutional temporal attention and multi-scale information learning for human action detection [J/OL]. Computer Engineering and Applications [2025-02-24]. . | |
| [2] | REN Z, ZHANG Q, GAO X, et al. Multi-modality learning for human action recognition [J]. Multimedia Tools and Applications, 2021, 80(11): 16185-16203. |
| [3] | 赵登阁,智敏. 用于人体动作识别的多尺度时空图卷积算法[J]. 计算机科学与探索, 2023, 17(3): 719-732. |
| ZHAO D G, ZHI M. Spatial multiple-temporal graph convolutional neural network for human action recognition [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 719-732. | |
| [4] | 丁帅,况立群,曹亚明,等. 时空特征融合的高精度轻量级骨架行为识别[J]. 计算机工程, 2025, 51(11): 283-293. |
| DING S, KUANG L Q, CAO Y M, et al. High-precision and lightweight skeleton behavior recognition based on spatial-temporal feature fusion [J]. Computer Engineering, 2025, 51(11): 283-293. | |
| [5] | 黄倩,崔静雯,李畅. 基于骨骼的人体行为识别方法研究综述[J]. 计算机辅助设计与图形学学报, 2024, 36(2): 173-194. |
| HUANG Q, CUI J W, LI C. A review of skeleton-based human action recognition [J]. Journal of Computer-Aided Design and Computer Graphics, 2024, 36(2): 173-194. | |
| [6] | LIN L, ZHANG J, LIU J. Actionlet-dependent contrastive learning for unsupervised skeleton-based action recognition [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 2363-2372. |
| [7] | WU Z, SUN P, CHEN X, et al. SelfGCN: graph convolution network with self-attention for skeleton-based action recognition [J]. IEEE Transactions on Image Processing, 2024, 33: 4391-4403. |
| [8] | YU B, YIN H, ZHU Z. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California: IJCAI.org, 2018: 3634-3640. |
| [9] | HUA Y, WU W, ZHENG C, et al. Part aware contrastive learning for self-supervised action recognition [C]// Proceedings of the 32nd International Joint Conference on Artificial Intelligence. California: IJCAI.org, 2023: 855-863. |
| [10] | FRANCO L, MANDICA P, MUNJAL B, et al. HYperbolic Self-Paced learning for self-supervised skeleton-based action representations [EB/OL]. [2025-02-23].. |
| [11] | WU Z, PAN S, LONG G, et al. Graph WaveNet for deep spatial-temporal graph modeling [C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: IJCAI.org, 2019: 1907-1913. |
| [12] | CHEN T, WANG J, SUN Y. Meta-MSGAT: meta multi-scale fused graph attention network [C]// Proceedings of the 2023 International Joint Conference on Neural Networks. Piscataway: IEEE, 2023: 1-8. |
| [13] | PLIZZARI C, CANNICI M, MATTEUCCI M. Skeleton-based action recognition via spatial and temporal Transformer networks[J]. Computer Vision and Image Understanding, 2021, 208/209: No.103219. |
| [14] | SUN S K, LIU D Z, DONG J F, et al. Unified multi-modal unsupervised representation learning for skeleton-based action understanding [C]// Proceedings of the 31st ACM International Conference on Multimedia. New York: ACM, 2023: 2973-2984. |
| [15] | GAO H, JIANG R, DONG Z, et al. Spatial-temporal-decoupled masked pre-training for spatiotemporal forecasting [C]// Proceedings of the 33rd International Joint Conference on Artificial Intelligence. California: IJCAI.org, 2024: 3998-4006. |
| [16] | WU C, WU X J, KITTLER J, et al. SCD-Net: spatiotemporal clues disentanglement network for self-supervised skeleton-based action recognition [C]// Proceedings of the 38th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2024: 5949-5957. |
| [17] | DONG J, SUN S, LIU Z, et al. Hierarchical contrast for unsupervised skeleton-based action representation learning [C]// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 525-533. |
| [18] | LIN J, GAN C, HAN S. TSM: temporal shift module for efficient video understanding [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7082-7092. |
| [19] | ZHANG J, LIN L, LIU J. Hierarchical consistent contrastive learning for skeleton-based action recognition with growing augmentations [C]// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 3427-3435. |
| [20] | LIU J, CHEN C, LIU M. Multi-modality co-learning for efficient skeleton-based action recognition [C]// Proceedings of the 32nd ACM International Conference on Multimedia. New York: ACM, 2024: 4909-4918. |
| [21] | SHAHROUDY A, LIU J, NG T T, et al. NTU RGB+D: a large scale dataset for 3D human activity analysis [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1010-1019. |
| [22] | LIU J, SHAHROUDY A, PEREZ M, et al. NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2684-2701. |
| [23] | WANG M, LI X, CHEN S, et al. Learning representations by contrastive spatio-temporal clustering for skeleton-based action recognition [J]. IEEE Transactions on Multimedia, 2024, 26: 3207-3220. |
| [24] | SHAH A, ROY A, SHAH K, et al. HaLP: hallucinating latent positives for skeleton-based self-supervised learning of actions[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 18846-18856. |
| [25] | YANG D, WANG Y, DANTCHEVA A, et al. View-invariant skeleton action representation learning via motion retargeting [J]. Image and Vision Computing, 2024, 132(7): 2351-2366. |
| [1] | Zuxi ZHANG, Zhancheng ZHANG, Fuyuan HU. Local and long-range temporal complementary modeling for video action recognition [J]. Journal of Computer Applications, 2026, 46(3): 758-766. |
| [2] | Yuhang XIAO, Guanfeng LI, Yuyin CHEN, Jing QIN. Few-shot relation extraction model with graph-based multi-view contrastive learning [J]. Journal of Computer Applications, 2026, 46(3): 732-740. |
| [3] | Tinghu WEI, Haoyan LIU, Jianning WU. Graph spatiotemporal learning model-based method for detecting dynamic changes of leg length discrepancy [J]. Journal of Computer Applications, 2026, 46(2): 587-595. |
| [4] | Hu LUO, Mingshu ZHANG. Rumor detection method based on cross-modal attention mechanism and contrastive learning [J]. Journal of Computer Applications, 2026, 46(2): 361-367. |
| [5] | Limei DONG, Yanzi LI, Jiayin LI, Li XU. Neighborhood-enhanced unsupervised graph anomaly detection [J]. Journal of Computer Applications, 2026, 46(2): 458-466. |
| [6] | Zhihui ZAN, Yajing WANG, Ke LI, Zhixiang YANG, Guangyu YANG. Multi-feature fusion speech emotion recognition method based on SAA-CNN-BiLSTM network [J]. Journal of Computer Applications, 2026, 46(1): 69-76. |
| [7] | Wen LI, Kairong LI, Kai YANG. Subgraph-aware contrastive learning with data augmentation [J]. Journal of Computer Applications, 2026, 46(1): 1-9. |
| [8] | Xingyao YANG, Zheng QI, Jiong YU, Zulian ZHANG, Shuai MA, Hongtao SHEN. Session-based recommendation model based on time-aware and space-enhanced dual channel graph neural network [J]. Journal of Computer Applications, 2026, 46(1): 104-112. |
| [9] | Ziyang CHENG, Ruizhang HUANG, Jingjing XUE. Deep evolutionary topic clustering model [J]. Journal of Computer Applications, 2026, 46(1): 85-94. |
| [10] | Yilin DENG, Fajiang YU. Pseudo random number generator based on LSTM and separable self-attention mechanism [J]. Journal of Computer Applications, 2025, 45(9): 2893-2901. |
| [11] | Chao SHI, Yuxin ZHOU, Qian FU, Wanyu TANG, Ling HE, Yuanyuan LI. Action recognition algorithm for ADHD patients using skeleton and 3D heatmap [J]. Journal of Computer Applications, 2025, 45(9): 3036-3044. |
| [12] | Chao LIU, Yanhua YU. Knowledge-aware recommendation model combining denoising strategy and multi-view contrastive learning [J]. Journal of Computer Applications, 2025, 45(9): 2827-2837. |
| [13] | Zhixiong XU, Bo LI, Xiaoyong BIAN, Qiren HU. Adversarial sample embedded attention U-Net for 3D medical image segmentation [J]. Journal of Computer Applications, 2025, 45(9): 3011-3016. |
| [14] | Zhiyuan WANG, Tao PENG, Jie YANG. Integrating internal and external data for out-of-distribution detection training and testing [J]. Journal of Computer Applications, 2025, 45(8): 2497-2506. |
| [15] | Jin XIE, Surong CHU, Yan QIANG, Juanjuan ZHAO, Hua ZHANG, Yong GAO. Dual-branch distribution consistency contrastive learning model for hard negative sample identification in chest X-rays [J]. Journal of Computer Applications, 2025, 45(7): 2369-2377. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||