Human action recognition method based on multi-scale feature fusion of single mode

doi:10.11772/j.issn.1001-9081.2022101473

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (10): 3236-3243.DOI: 10.11772/j.issn.1001-9081.2022101473

• Multimedia computing and computer simulation • Previous Articles

Human action recognition method based on multi-scale feature fusion of single mode

Suolan LIU¹^,², Zhenzhen TIAN¹, Hongyuan WANG¹(), Long LIN¹, Yan WANG¹

^1.School of Computer Science and Artificial Intelligence，Aliyun School of Big Data，School of Software，Changzhou University，Changzhou Jiangsu 213164，China
^2.Jiangsu Key Laboratory of Image and Video Understanding for Social Security （Nanjing University of Science and Technology），Nanjing Jiangsu 210094，China

Received:2022-10-11 Revised:2022-12-29 Accepted:2023-01-03 Online:2023-04-12 Published:2023-10-10
Contact: Hongyuan WANG
About author:LIU Suolan， born in 1980， Ph. D.， associate professor. Her research interests include computer vision， artificial intelligence.
TIAN Zhenzhen， born in 1997， M. S. candidate. Her research interests include computer vision， pattern recognition.
LIN Long， born in 1998， M. S. candidate. His research interests include computer vision， data augmentation.
WANG Yan， born in 1999， M. S. candidate. His research interests include computer vision， pattern recognition.
Supported by:
National Natural Science Foundation of China(61976028);Open Project of Jiangsu Key Laboratory of Image and Video Understanding for Social Security(J2021-2)

基于单模态的多尺度特征融合人体行为识别方法

刘锁兰¹^,², 田珍珍¹, 王洪元¹(), 林龙¹, 王炎¹

^1.常州大学计算机与人工智能学院阿里云大数据学院软件学院，江苏常州 213164
^2.江苏省社会安全图像与视频理解重点实验室（南京理工大学），南京 210094

通讯作者: 王洪元
作者简介:刘锁兰（1980—），女，江苏泰州人，副教授，博士，CCF会员，主要研究方向：计算机视觉、人工智能
田珍珍（1997—），女，河南郑州人，硕士研究生，主要研究方向：计算机视觉、模式识别
林龙（1998—），男，四川德阳人，硕士研究生，主要研究方向：计算机视觉、数据增强
王炎（1999—），男，江苏连云港人，硕士研究生，主要研究方向：计算机视觉、模式识别。
基金资助:
国家自然科学基金资助项目(61976028);江苏省社会安全图像与视频理解重点实验室开放课题(J2021?2)

Abstract

Abstract:

In order to solve the problem of insufficient mining of potential association between remote nodes in human action recognition tasks， and the problem of high training cost caused by using multi-modal data， a multi-scale feature fusion human action recognition method under the condition of single mode was proposed. Firstly， the global feature correlation of the original skeleton diagram of human body was carried out， and the coarse-scale global features were used to capture the connections between the remote nodes. Secondly， the global feature correlation graph was divided locally to obtain the Complementary Subgraphs with Global Features （CSGFs）， the fine-scale features were used to establish the strong correlation， and the multi-scale feature complementarity was formed. Finally， the CSGFs were input into the spatial-temporal Graph Convolutional module for feature extraction， and the extracted results were aggregated to output the final classification results. Experimental results show that the accuracy of the proposed method on the authoritative action recognition dataset NTU RGB+D60 is 89.0% （X-sub） and 94.2% （X-view） respectively. On the challenging large-scale dataset NTU RGB+D120， the accuracy of the proposed method is 83.3% （X-sub） and 85.0% （X-setup） respectively， which is 1.4 and 0.9 percentage points higher than that of the ST-TR （Spatial-Temporal TRansformer） under single modal respectively， and 4.1 and 3.5 percentage points higher than that of the lightweight SGN （Semantics-Guided Network）. It can be seen that the proposed method can fully exploit the synergistic complementarity of multi-scale features， and effectively improve the recognition accuracy and training efficiency of the model under the condition of single modal.

Key words: human action recognition, skeleton joint, Graph Convolutional Network (GCN), single mode, multi-scale, feature fusion

摘要：

针对人体行为识别任务中未能充分挖掘超距关节点之间潜在关联的问题，以及使用多模态数据带来的高昂训练成本的问题，提出一种单模态条件下的多尺度特征融合人体行为识别方法。首先，将人体的原始骨架图进行全局特征关联，并利用粗尺度的全局特征捕获远距离关节点间的联系；其次，对全局特征关联图进行局部划分以得到融合了全局特征的互补子图（CSGF），利用细尺度特征建立强关联，并形成多尺度特征的互补；最后，将CSGF输入时空图卷积模块中提取特征，并聚合提取后的结果以输出最终的分类结果。实验结果表明，在行为识别权威数据集NTU RGB+D60上，所提方法的准确率分别为89.0%（X-sub）和94.2%（X-view）；在具有挑战性的大规模数据集NTU RGB+D120上，所提方法的准确率分别为83.3%（X-sub）和85.0%（X-setup），与单模态下的ST-TR（Spatial-Temporal TRansformer）相比，分别提升1.4和0.9个百分点，与轻量级SGN（Semantics-Guided Network）相比，分别提升4.1和3.5个百分点。可见，所提方法能够充分挖掘多尺度特征的协同互补性，并有效提高单模态条件下模型的识别准确率和训练效率。

关键词: 人体行为识别, 骨架关节点, 图卷积网络, 单模态, 多尺度, 特征融合

CLC Number:

TP391.41

Suolan LIU, Zhenzhen TIAN, Hongyuan WANG, Long LIN, Yan WANG. Human action recognition method based on multi-scale feature fusion of single mode[J]. Journal of Computer Applications, 2023, 43(10): 3236-3243.

刘锁兰, 田珍珍, 王洪元, 林龙, 王炎. 基于单模态的多尺度特征融合人体行为识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3236-3243.

Figures/Tables 8

References 33

1	SI C， CHEN W， WANG W， et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1227-1236. 10.1109/cvpr.2019.00132
2	A van den OORD， KALCHBRENNER N， KAVUKCUOGLU K. Pixel recurrent neural networks［C］// Proceedings of the 33rd International Conference on Machine Learning. New York： JMLR.org， 2016： 1747-1756.
3	DEFFERRARD M， BRESSON X， VANDERGHEYNST P. Convolutional neural networks on graphs with fast localized spectral filtering［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2016： 3844-3852.
4	YANG H， YAN D， ZHANG L， et al. Feedback graph convolutional network for skeleton-based action recognition［J］. IEEE Transactions on Image Processing， 2022， 31： 164-175. 10.1109/tip.2021.3129117
5	YAN S， XIONG Y， LIN D. Spatial temporal graph convolutional networks for skeleton-based action recognition［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 7444-7452. 10.1609/aaai.v32i1.12328
6	SHI L， ZHANG Y， CHENG J， et al. Decoupled spatial-temporal attention network for skeleton-based action recognition［C］// Proceedings of the 2020 Asian Conference on Computer Vision， LNCS 12626. Cham： Springer， 2021： 38-53.
7	CHEN Y， ZHANG Z， YUAN C， et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 13339-13348. 10.1109/iccv48922.2021.01311
8	LI C， CUI Z， ZHENG W， et al. Action-attending graphic neural network［J］. IEEE Transactions on Image Processing， 2018， 27（7）： 3657-3670. 10.1109/tip.2018.2815744
9	PENG W， HONG X， CHEN H， et al. Learning graph convolutional network for skeleton-based human action recognition by neural searching［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 2669-2676. 10.1609/aaai.v34i03.5652
10	ZHAO R， WANG K， SU H， et al. Bayesian graph convolution LSTM for skeleton based action recognition［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6882-6892. 10.1109/iccv.2019.00698
11	GAO J， HE T， ZHOU X， et al. Focusing and diffusion： bidirectional attentive graph convolutional networks for skeleton-based action recognition［EB/OL］. （2019-12-24）. ［2022-08-13］.. 10.1109/lsp.2021.3116513
12	KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks［EB/OL］. （2017-02-22）. ［2022-09-10］.. 10.48550/arXiv.1609.02907
13	LIU Z， ZHANG H， CHEN Z， et al. Disentangling and unifying graph convolutions for skeleton-based action recognition［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 143-152. 10.1109/cvpr42600.2020.00022
14	CHENG K， ZHANG Y， HE X， et al. Skeleton-based action recognition with shift graph convolutional network［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 180-189. 10.1109/cvpr42600.2020.00026
15	SONG Y F， ZHANG Z， SHAN C， et al. Richly activated graph convolutional network for robust skeleton-based action recognition［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2021， 31（5）： 1915-1925. 10.1109/tcsvt.2020.3015051
16	CHO S， MAQBOOL M H， LIU F， et al. Self-attention network for skeleton-based human action recognition［C］// Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2020： 624-633. 10.1109/wacv45572.2020.9093639
17	YU W， YANG K， YAO H， et al. Exploiting the complementary strengths of multi-layer CNN features for image retrieval［J］. Neurocomputing， 2017， 237： 235-241. 10.1016/j.neucom.2016.12.002
18	刘渭滨，邹智元，邢薇薇. 模式分类中的特征融合方法［J］. 北京邮电大学学报， 2017， 40（4）： 1-8.
	LIU W B， ZOU Z Y， XING W W. Feature fusion method in pattern classification［J］. Journal of Beijing University of Posts and Telecommunications， 2017， 40（4）： 1-8.
19	SHI L， ZHANG Y， CHENG J， et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 12018-12027. 10.1109/cvpr.2019.01230
20	CHEN Y， ROHRBACH M， YAN Z， et al. Graph-based global reasoning networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 433-442. 10.1109/cvpr.2019.00052
21	SHAHROUDY A， LIU J， NG T T， et al. NTU RGB+ D： a large scale dataset for 3D human activity analysis［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 1010-1019. 10.1109/cvpr.2016.115
22	LIU J， SHAHROUDY A， PEREZ M， et al. NTU RGB+ D 120： a large-scale benchmark for 3D human activity understanding［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（10）： 2684-2701. 10.1109/tpami.2019.2916873
23	PASZKE A， GROSS S， CHINTALA S， et al. Automatic differentiation in PyTorch［EB/OL］. （2017-10-29）［2020-12-01］..
24	HUANG L， HUANG Y， OUYANG W， et al. Part-level graph convolutional network for skeleton-based action recognition［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 11045-11052. 10.1609/aaai.v34i07.6759
25	SHI L， ZHANG Y， CHENG J， et al. Skeleton-based action recognition with directed graph neural networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7904-7913. 10.1109/cvpr.2019.00810
26	THAKKAR K， NARAYANAN P J. Part-based graph convolutional network for action recognition［EB/OL］. （2018-09-13）［2022-08-13］..
27	YANG H， GU Y， ZHU J， et al. PGCN-TCA： pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition［J］. IEEE Access， 2020， 8： 10040-10047. 10.1109/access.2020.2964115
28	PLIZZARI C， CANNICI M， MATTEUCCI M. Skeleton-based action recognition via spatial and temporal transformer networks［J］. Computer Vision and Image Understanding， 2021， 208/209： No.103219. 10.1016/j.cviu.2021.103219
29	ZHANG P， LAN C， ZENG W， et al. Semantics-guided neural networks for efficient skeleton-based human action recognition［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1109-1118. 10.1109/cvpr42600.2020.00119
30	CHEN Z， LIU H， GUO T， et al. Contrastive learning from spatio-temporal mixed skeleton sequences for self-supervised skeleton-based action recognition［EB/OL］. （2022-07-07）［2022-10-23］..
31	CHEN Z， LI S， YANG B， et al. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021： 1113-1122. 10.1609/aaai.v35i2.16197
32	PAPADOPOULOS K， GHORBEL E， AOUADA D， et al. Vertex feature encoding and hierarchical temporal modeling in a spatial-temporal graph convolutional network for action recognition［C］// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway： IEEE， 2021： 452-458. 10.1109/icpr48806.2021.9413189
33	MEMMESHEIMER R， THEISEN N， PAULUS D. Gimme signals： discriminative signal encoding for multimodal activity recognition［C］// Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway： IEEE， 2020： 10394-10401. 10.1109/iros45743.2020.9341699

方法	准确率/%	参数量/10⁶
RA-GCN（3s）	87.3	6.21
Shift-GCN（1s）	87.8	0.72
ST-TR（1s）	88.7	6.48
DGNN（2s）	89.9	26.20
PL-GCN	89.2	20.70
PB-GCN	87.5	3.55
本文方法	89.0	4.10

方法	准确率/%	参数量/10⁶
RA-GCN（3s）	87.3	6.21
Shift-GCN（1s）	87.8	0.72
ST-TR（1s）	88.7	6.48
DGNN（2s）	89.9	26.20
PL-GCN	89.2	20.70
PB-GCN	87.5	3.55
本文方法	89.0	4.10

特征数	方法	X-sub	X-view
单特征	ST-GCN	81.5	88.3
	Global feature graph	86.7	93.1
	3subgraph	86.8	93.3
	4subgraph	87.4	93.7
	5subgraph	86.9	93.4
	6subgraph	87.0	93.2
多特征融合	Global feature graph+3subgraph	88.8	94.2
	Global feature graph+4subgraph	89.0	94.2
	Global feature graph+5subgraph	88.2	94.1
	Global feature graph+6subgraph	88.7	93.6

特征数	方法	X-sub	X-view
单特征	ST-GCN	81.5	88.3
	Global feature graph	86.7	93.1
	3subgraph	86.8	93.3
	4subgraph	87.4	93.7
	5subgraph	86.9	93.4
	6subgraph	87.0	93.2
多特征融合	Global feature graph+3subgraph	88.8	94.2
	Global feature graph+4subgraph	89.0	94.2
	Global feature graph+5subgraph	88.2	94.1
	Global feature graph+6subgraph	88.7	93.6

方法	X-sub	X-view
ST-GCN	81.5	88.3
PB-GCN	87.5	93.2
SAN	87.2	92.7
SGN	89.0	94.5
PGCN-TCA	88.0	93.6
ST-TR（1s）	88.7	95.6
RA-GCN（3s）	87.3	93.6
MST-GCN（1s）	89.0	95.1
Shift-GCN（1s）	87.8	95.1
SkeleMixCLR（3s）	87.7	94.0
本文方法	89.0	94.2

Human action recognition method based on multi-scale feature fusion of single mode

基于单模态的多尺度特征融合人体行为识别方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 33

Related Articles 15

Recommended Articles

Metrics

方法	X-sub	X-setup
GVFE+AS-GCN with DH-TCN	78.3	79.8
Gimme Signals	70.8	71.6
SkeleMixCLR（3s）	82.0	82.9
Shift-GCN（1s）	80.9	83.2
MST-GCN（1s）	82.8	84.5
RA-GCN（3s）	81.1	82.7
ST-TR（1s）	81.9	84.1
SGN	79.2	81.5
本文方法	83.3	85.0

[1]	Hong WANG, Qing QIAN, Huan WANG, Yong LONG. Lightweight image tamper localization algorithm based on large kernel attention convolution [J]. Journal of Computer Applications, 2023, 43(9): 2692-2699.
[2]	Hao YANG, Yi ZHANG. Feature pyramid network algorithm based on context information and multi-scale fusion importance awareness [J]. Journal of Computer Applications, 2023, 43(9): 2727-2734.
[3]	Shengwei DUAN, Xinyu CHENG, Haozhou WANG, Fei WANG. Dam surface disease detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2023, 43(8): 2619-2629.
[4]	Huan LIU, Lianghong WU, Lyu ZHANG, Liang CHEN, Bowen ZHOU, Hongqiang ZHANG. Leukocyte detection method based on twice-fusion-feature CenterNet [J]. Journal of Computer Applications, 2023, 43(8): 2602-2610.
[5]	Zelin XU, Min YANG, Meng CHEN. Point-of-interest category representation model with spatial and textual information [J]. Journal of Computer Applications, 2023, 43(8): 2456-2461.
[6]	Ailing QI, Xuanlin WANG. Fine-grained image recognition based on mid-level subtle feature extraction and multi-scale feature fusion [J]. Journal of Computer Applications, 2023, 43(8): 2556-2563.
[7]	Doudou LI, Wanggen LI, Yichun XIA, Yang SHU, Kun GAO. Skeleton-based action recognition based on feature interaction and adaptive fusion [J]. Journal of Computer Applications, 2023, 43(8): 2581-2587.
[8]	Meijia LIANG, Xinwu LIU, Xiaopeng HU. Small target detection algorithm for train operating environment image based on improved YOLOv3 [J]. Journal of Computer Applications, 2023, 43(8): 2611-2618.
[9]	Shuai ZHENG, Xiaolong ZHANG, He DENG, Hongwei REN. 3D liver image segmentation method based on multi-scale feature fusion and grid attention mechanism [J]. Journal of Computer Applications, 2023, 43(7): 2303-2310.
[10]	Chunlan ZHAN, Anzhi WANG, Minghui WANG. Camouflage object segmentation method based on channel attention and edge fusion [J]. Journal of Computer Applications, 2023, 43(7): 2166-2172.
[11]	Yi ZHANG, Gangsheng CAI, Zhenmei WANG. Long non-coding RNA-disease association prediction model based on semantic and global dual attention mechanism [J]. Journal of Computer Applications, 2023, 43(7): 2125-2132.
[12]	Yuan WEI, Yan LIN, Shengnan GUO, Youfang LIN, Huaiyu WAN. Prediction of taxi demands between urban regions by fusing origin-destination spatial-temporal correlation [J]. Journal of Computer Applications, 2023, 43(7): 2100-2106.
[13]	Kejun JIN, Hongtao YU, Yiteng WU, Shaomei LI, Jianpeng ZHANG, Honghao ZHENG. Improved defense method for graph convolutional network based on singular value decomposition [J]. Journal of Computer Applications, 2023, 43(5): 1511-1517.
[14]	Zhouhua ZHU, Qi QI. Automatic detection and recognition of electric vehicle helmet based on improved YOLOv5s [J]. Journal of Computer Applications, 2023, 43(4): 1291-1296.
[15]	Cheng FANG, Bei LI, Ping HAN, Qiong WU. Fine-grained emotion classification of Chinese microblog based on syntactic dependency graph [J]. Journal of Computer Applications, 2023, 43(4): 1056-1061.