Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (2): 521-528.DOI: 10.11772/j.issn.1001-9081.2022010017
Special Issue: 多媒体计算与计算机仿真
• Multimedia computing and computer simulation • Previous Articles Next Articles
Received:
2022-01-07
Revised:
2022-03-18
Accepted:
2022-04-06
Online:
2022-04-21
Published:
2023-02-10
Contact:
Yi ZHANG
About author:
NI Ranyan, born in 1998, M. S. candidate. Her research interests include computer vision, action recognition.
Supported by:
通讯作者:
张轶
作者简介:
倪苒岩(1998—),女,安徽黄山人,硕士研究生,主要研究方向:计算机视觉、行为识别;
基金资助:
CLC Number:
Ranyan NI, Yi ZHANG. Action recognition method based on video spatio-temporal features[J]. Journal of Computer Applications, 2023, 43(2): 521-528.
倪苒岩, 张轶. 基于视频时空特征的行为识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 521-528.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022010017
数据集 | 方法 | 年份 | 准确率/% |
---|---|---|---|
UCF101 | 文献[ | 2014 | 88.8 |
文献[ | 2015 | 82.3 | |
文献[ | 2019 | 95.9 | |
文献[ | 2016 | 94.0 | |
文献[ | 2017 | 92.0 | |
文献[ | 2017 | 88.6 | |
文献[ | 2017 | 93.2 | |
文献[ | 2019 | 93.6 | |
文献[ | 2018 | 94.3 | |
文献[ | 2020 | 95.6 | |
文献[ | 2020 | 94.7 | |
文献[ | 2020 | 93.0 | |
文献[ | 2020 | 94.9 | |
文献[ | 2021 | 95.6 | |
本文方法 | 2022 | 96.5 | |
HMDB51 | 文献[ | 2014 | 59.4 |
文献[ | 2015 | 56.8 | |
文献[ | 2019 | 70.7 | |
文献[ | 2016 | 68.5 | |
文献[ | 2018 | 68.3 | |
文献[ | 2017 | 59.2 | |
文献[ | 2019 | 69.4 | |
文献[ | 2020 | 71.5 | |
文献[ | 2020 | 69.7 | |
文献[ | 2020 | 72.1 | |
本文方法 | 2022 | 73.1 | |
Something-Something-V1 | 文献[ | 2019 | 45.6 |
文献[ | 2016 | 19.5 | |
文献[ | 2018 | 39.6 | |
文献[ | 2021 | 43.9 | |
文献[ | 2017 | 41.6 | |
文献[ | 2018 | 34.4 | |
文献[ | 2020 | 46.5 | |
本文方法 | 2022 | 46.6 |
Tab. 1 Comparison of different methods on three datasets
数据集 | 方法 | 年份 | 准确率/% |
---|---|---|---|
UCF101 | 文献[ | 2014 | 88.8 |
文献[ | 2015 | 82.3 | |
文献[ | 2019 | 95.9 | |
文献[ | 2016 | 94.0 | |
文献[ | 2017 | 92.0 | |
文献[ | 2017 | 88.6 | |
文献[ | 2017 | 93.2 | |
文献[ | 2019 | 93.6 | |
文献[ | 2018 | 94.3 | |
文献[ | 2020 | 95.6 | |
文献[ | 2020 | 94.7 | |
文献[ | 2020 | 93.0 | |
文献[ | 2020 | 94.9 | |
文献[ | 2021 | 95.6 | |
本文方法 | 2022 | 96.5 | |
HMDB51 | 文献[ | 2014 | 59.4 |
文献[ | 2015 | 56.8 | |
文献[ | 2019 | 70.7 | |
文献[ | 2016 | 68.5 | |
文献[ | 2018 | 68.3 | |
文献[ | 2017 | 59.2 | |
文献[ | 2019 | 69.4 | |
文献[ | 2020 | 71.5 | |
文献[ | 2020 | 69.7 | |
文献[ | 2020 | 72.1 | |
本文方法 | 2022 | 73.1 | |
Something-Something-V1 | 文献[ | 2019 | 45.6 |
文献[ | 2016 | 19.5 | |
文献[ | 2018 | 39.6 | |
文献[ | 2021 | 43.9 | |
文献[ | 2017 | 41.6 | |
文献[ | 2018 | 34.4 | |
文献[ | 2020 | 46.5 | |
本文方法 | 2022 | 46.6 |
方法 | 采样 帧数 | 参数量/106 | 浮点运算量/GFLOPs | 准确率/% |
---|---|---|---|---|
文献[ | 8 | 24.3 | 33 | 45.6 |
文献[ | 8 | 10.7 | 16 | 19.5 |
文献[ | 8 | 47.5 | 32 | 39.6 |
文献[ | 8 | 24.6 | 34 | 43.9 |
文献[ | 32 | 28.0 | 153 | 41.6 |
文献[ | 8 | 18.3 | 33 | 34.4 |
文献[ | 8 | 25.6 | 33 | 46.5 |
本文方法 | 8 | 25.7 | 34 | 46.6 |
Tab. 2 Comparison of sampling frames, parameters, FLOPs and accuracy among different methods on Something-Something-V1 dataset
方法 | 采样 帧数 | 参数量/106 | 浮点运算量/GFLOPs | 准确率/% |
---|---|---|---|---|
文献[ | 8 | 24.3 | 33 | 45.6 |
文献[ | 8 | 10.7 | 16 | 19.5 |
文献[ | 8 | 47.5 | 32 | 39.6 |
文献[ | 8 | 24.6 | 34 | 43.9 |
文献[ | 32 | 28.0 | 153 | 41.6 |
文献[ | 8 | 18.3 | 33 | 34.4 |
文献[ | 8 | 25.6 | 33 | 46.5 |
本文方法 | 8 | 25.7 | 34 | 46.6 |
方法 | 准确率/% |
---|---|
Baseline | 45.6 |
Baseline+运动信息提取模块 | 46.0 |
Baseline+时空信息提取模块 | 45.9 |
Baseline+运动信息提取模块+时空信息提取模块 | 46.6 |
Tab. 3 Influence of different modules on network
方法 | 准确率/% |
---|---|
Baseline | 45.6 |
Baseline+运动信息提取模块 | 46.0 |
Baseline+时空信息提取模块 | 45.9 |
Baseline+运动信息提取模块+时空信息提取模块 | 46.6 |
1 | WANG H, KLÄSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013, 103(1): 60-79. 10.1007/s11263-012-0594-8 |
2 | 王萍,庞文浩. 基于视频分段的空时双通道卷积神经网络的行为识别[J]. 计算机应用, 2019, 39(7):2081-2086. 10.11772/j.issn.1001-9081.2019010156 |
WANG P, PANG W H. Two-stream CNN for action recognition based on video segmentation[J]. Journal of Computer Applications, 2019, 39(7): 2081-2086. 10.11772/j.issn.1001-9081.2019010156 | |
3 | KLÄSER A, MARSZAŁEK M, SCHMID C. A spatio-temporal descriptor based on 3D-gradients[C]// Proceedings of the 2008 British Machine Vision Conference. Durham: BMVA Press, 2008: No.99. 10.5244/c.22.99 |
4 | 郭明祥,宋全军,徐湛楠,等. 基于三维残差稠密网络的人体行为识别算法[J]. 计算机应用, 2019, 39(12):3482-3489. 10.11772/j.issn.1001-9081.2019061056 |
GUO M X, SONG Q J, XU Z N, et al. Human behavior recognition algorithm based on three-dimensional residual dense network[J]. Journal of Computer Applications, 2019, 39(12):3482-3489. 10.11772/j.issn.1001-9081.2019061056 | |
5 | SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1. Cambridge: MIT Press, 2014:568-576. |
6 | TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015:4489-4497. 10.1109/iccv.2015.510 |
7 | LIN J, GAN C, HAN S. TSM: temporal shift module for efficient video understanding[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019:7082-7092. 10.1109/iccv.2019.00718 |
8 | WANG L M, XIONG Y J, WANG Z, et al. Temporal segment networks: towards good practices for deep action recognition[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9912. Cham: Springer, 2016: 20-36. |
9 | LAN Z Z, ZHU Y, HAUPTMANN A G, et al. Deep local video feature for action recognition[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2017: 1219-1225. 10.1109/cvprw.2017.161 |
10 | LIN W Y, MI Y, WU J X, et al. Action recognition with coarse-to-fine deep feature integration and asynchronous fusion[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018:7130-7137. 10.1609/aaai.v32i1.12232 |
11 | JI S W, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):221-231. 10.1109/tpami.2012.59 |
12 | TRAN D, RAY J, SHOU Z, et al. ConvNet architecture search for spatio-temporal feature learning[EB/OL]. (2017-08-16) [2021-12-26].. |
13 | CAI J H, HU J G. 3D RANs: 3D residual attention networks for action recognition[J]. The Visual Computer, 2020, 36(6): 1261-1270. 10.1007/s00371-019-01733-3 |
14 | ZOLFAGHARI M, SINGH K, BROX T. ECO: efficient convolutional network for online video understanding[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11206. Cham: Springer, 2018: 713-730. |
15 | LEE M, LEE S, SON S, et al. Motion feature network: fixed motion filter for action recognition[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11214. Cham: Springer, 2018: 392-408. |
16 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
17 | DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 248-255. 10.1109/cvpr.2009.5206848 |
18 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
19 | SOOMRO K, ZAMIR A R, SHAH M. UCF101: a dataset of 101 human actions classes from videos in the wild[EB/OL]. [2021-12-26].. |
20 | KUEHNE H, JHUANG H, GARROTE E, et al. HMDB: a large video database for human motion recognition[C]// Proceedings of the 2011 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2011:2556-2563. 10.1109/iccv.2011.6126543 |
21 | GOYAL R, KAHOU S E, MICHALSKI V, et al. The "something something" video database for learning and evaluating visual common sense[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5843-5851. 10.1109/iccv.2017.622 |
22 | TRAN A, CHEONG L F. Two-stream flow-guided convolutional attention networks for action recognition[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE, 2017:3110-3119. 10.1109/iccvw.2017.368 |
23 | QIU Z F, YAO T, MEI T. Learning spatio-temporal representation with pseudo-3D residual networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5534-5542. 10.1109/iccv.2017.590 |
24 | DIBA, FAYYAZ M, SHARMA V, et al. Temporal 3D ConvNets: new architecture and transfer learning for video classification[EB/OL]. (2017-11-22) [2021-12-26].. |
25 | KAZAKOS E, NAGRANI A, ZISSERMAN A, et al. EPIC-fusion: audio-visual temporal binding for egocentric action recognition[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 5491-5500. 10.1109/iccv.2019.00559 |
26 | WANG L M, LI W, LI W, et al. Appearance-and-relation networks for video classification[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1430-1439. 10.1109/cvpr.2018.00155 |
27 | LI X Y, SHUAI B, TIGHE J. Directional temporal modeling for action recognition[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12351. Cham: Springer, 2020:275-291. |
28 | KUMAWAT S, VERMA M, NAKASHIMA Y, et al. Depthwise spatio-temporal STFT convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9):4839-4851. |
29 | SAHOO S P, ARI S, MAHAPATRA K, et al. HAR-Depth: a novel framework for human action recognition using sequential learning and depth estimated history images[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2021, 5(5): 813-825. 10.1109/tetci.2020.3014367 |
30 | ZHANG J X, HU H F, LIU Z. Appearance-and-dynamic learning with bifurcated convolution neural network for action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(4): 1593-1606. 10.1109/tcsvt.2020.3006223 |
31 | BAI S K, WANG Q, LI X L. MFI: multi-range feature interchange for video action recognition[C]// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021: 6664-6671. 10.1109/icpr48806.2021.9412124 |
32 | CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? a new model and the kinetics dataset[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4724-4733. 10.1109/cvpr.2017.502 |
33 | ZHOU B L, ANDONIAN A, OLIVA A, et al. Temporal relational reasoning in videos[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11205. Cham: Springer, 2018: 831-846. |
34 | LIU Z Y, WANG L M, WU W, et al. TAM: temporal adaptive module for video recognition[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021:13688-13698. 10.1109/iccv48922.2021.01345 |
35 | SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 618-626. 10.1109/iccv.2017.74 |
[1] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[2] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[3] | Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242. |
[4] | Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization [J]. Journal of Computer Applications, 2024, 44(7): 1987-1994. |
[5] | Mengyuan HUANG, Kan CHANG, Mingyang LING, Xinjie WEI, Tuanfa QIN. Progressive enhancement algorithm for low-light images based on layer guidance [J]. Journal of Computer Applications, 2024, 44(6): 1911-1919. |
[6] | Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI. Multi-relation approximate reasoning model based on uncertain knowledge graph embedding [J]. Journal of Computer Applications, 2024, 44(6): 1751-1759. |
[7] | Wenshuo GAO, Xiaoyun CHEN. Point cloud classification network based on node structure [J]. Journal of Computer Applications, 2024, 44(5): 1471-1478. |
[8] | Min SUN, Qian CHENG, Xining DING. CBAM-CGRU-SVM based malware detection method for Android [J]. Journal of Computer Applications, 2024, 44(5): 1539-1545. |
[9] | Jie WANG, Hua MENG. Image classification algorithm based on overall topological structure of point cloud [J]. Journal of Computer Applications, 2024, 44(4): 1107-1113. |
[10] | Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(4): 1114-1120. |
[11] | Lijun XU, Hui LI, Zuyang LIU, Kansong CHEN, Weixuan MA. 3D-GA-Unet: MRI image segmentation algorithm for glioma based on 3D-Ghost CNN [J]. Journal of Computer Applications, 2024, 44(4): 1294-1302. |
[12] | Jingxian ZHOU, Xina LI. UAV detection and recognition based on improved convolutional neural network and radio frequency fingerprint [J]. Journal of Computer Applications, 2024, 44(3): 876-882. |
[13] | Ruifeng HOU, Pengcheng ZHANG, Liyuan ZHANG, Zhiguo GUI, Yi LIU, Haowen ZHANG, Shubin WANG. Iterative denoising network based on total variation regular term expansion [J]. Journal of Computer Applications, 2024, 44(3): 916-921. |
[14] | Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708. |
[15] | Jiawei ZHANG, Guandong GAO, Ke XIAO, Shengzun SONG. Violent crime hierarchy algorithm by joint modeling of improved hierarchical attention network and TextCNN [J]. Journal of Computer Applications, 2024, 44(2): 403-410. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||