Human behavior recognition algorithm based on skeletal temporal divergence feature

doi:10.11772/j.issn.1001-9081.2020081178

Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (5): 1450-1457.DOI: 10.11772/j.issn.1001-9081.2020081178

Special Issue: 多媒体计算与计算机仿真

• Virtual reality and multimedia computing • Previous Articles Next Articles

Human behavior recognition algorithm based on skeletal temporal divergence feature

TIAN Zhiqiang^1,2,3, DENG Chunhua^1,2,3, ZHANG Junwen^1,2,3

1. School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
2. Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
3. Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System(Wuhan University of Science and Technology), Wuhan Hubei 430065, China

Received:2020-08-06 Revised:2020-11-15 Online:2020-12-09 Published:2021-05-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61806150), the Program of Science and Technology Department of Hubei Province (2018CFB195), the Young Talent Program of Science and Technology Research Plan of Education Department of Hubei Province (Q20181104), the Open Foundation of Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System (znxx2018QN09),the National Defense Advanced Research Foundation of Wuhan University of Science and Technology (GF201814).

基于骨骼时序散度特征的人体行为识别算法

田志强^1,2,3, 邓春华^1,2,3, 张俊雯^1,2,3

1. 武汉科技大学计算机科学与技术学院, 武汉 430065;
2. 武汉科技大学大数据科学与工程研究院, 武汉 430065;
3. 智能信息处理与实时工业系统湖北省重点实验室(武汉科技大学), 武汉 430065

通讯作者: 邓春华
作者简介:田志强(1996-),男,湖北武汉人,硕士研究生,主要研究方向:计算机视觉、机器学习;邓春华(1984-),男,湖南郴州人,副教授,博士,主要研究方向:计算机视觉、机器学习;张俊雯(1997-),女,湖北荆门人,硕士研究生,主要研究方向:计算机视觉、机器学习。
基金资助:
国家自然科学基金资助项目（61806150）；湖北省科技厅计划项目（2018CFB195）；湖北省教育厅科学技术研究计划青年人才项目（Q20181104）；智能信息处理与实时工业系统湖北省重点实验室开放基金资助项目（znxx2018QN09）；武汉科技大学国防预研基金资助项目（GF201814）。

Abstract

Abstract: Human behavior recognition is an important basic technology in the fields such as intelligent monitoring, human-computer interaction and robotics. Graph Convolutional Neural Network (GCN) achieve excellent performance in skeleton-based human behavior recognition. The following problems exist in the research of human behavior recognition using GCNs:1) the human skeleton points are represented by coordinates, which lacks detailed information about the movement of the skeleton points; 2) in some videos, the motion amplitude of the human skeleton is too small, so that the representation information of the key skeleton points is not obvious. Aiming at the above problems, firstly, a temporal divergence model of skeleton points was designed to describe the movement states of the skeleton points, which amplified the between-class variances of different human behaviors. In addition, the attention mechanism of temporal divergence features was designed to highlight the key skeleton points and further expand the between-class variances. Finally, a two-stream fusion model was constructed based on the complementarity between the spatial data characteristics of the original skeleton and the temporal divergence characteristics. The proposed algorithm achieved the accuracy of 82.9% and 83.7% under two partitioning strategies of authoritative human behavior dataset NTU-RGB+D respectively, which were 1.3 percentage points and 0.5 percentage points higher than those of Adaptive Graph Convolutional Network (AGCN) respectively. The improvement of the accuracy of the proposed algorithm on the dataset proves the effectiveness of this algorithm.

Key words: skeleton, behavior recognition, graph convolution, temporal divergence, attention

摘要： 人体行为识别是智能监控、人机交互、机器人等领域的一项重要的基础技术。图卷积神经网络（GCN）在基于骨骼的人体行为识别上取得了卓越的性能。不过GCN在人体行为识别研究中存在以下问题：1）人体骨架的骨骼点采用坐标表示，缺乏骨骼点的运动细节信息；2）在某些视频中，人体骨架的运动幅度太小导致关键骨骼点的表征信息不明显。针对上述问题，首先提出骨骼点的时序散度模型来描述骨骼点的运动状态，从而放大了不同人体行为的类间方差。并进一步提出了时序散度特征的注意力机制，以突显关键骨骼点，进一步扩大类间方差。最后根据原始骨架的空间数据特征和时序散度特征的互补性构建了双流融合模型。所提算法在权威的人体行为数据集NTU-RGB+D的两种划分策略下分别达到了82.9%和83.7%的准确率，相比自适应图卷积网络（AGCN）提高了1.3个百分点和0.5个百分点，准确率的提升证明了所提算法的有效性。

关键词: 骨骼, 行为识别, 图卷积, 时序散度, 注意力

CLC Number:

TP18

TIAN Zhiqiang, DENG Chunhua, ZHANG Junwen. Human behavior recognition algorithm based on skeletal temporal divergence feature[J]. Journal of Computer Applications, 2021, 41(5): 1450-1457.

田志强, 邓春华, 张俊雯. 基于骨骼时序散度特征的人体行为识别算法[J]. 计算机应用, 2021, 41(5): 1450-1457.

References

[1] 朱煜, 赵江坤, 王逸宁, 等. 基于深度学习的人体行为识别算法综述[J]. 自动化学报,2016,42(6):848-857.(ZHU Y,ZHAO J K,WANG Y N,et al. A review of human action recognition based on deep learning[J]. Acta Automatica Sinica,2016,42(6):848-857.)
[2] FEICHTENHOFER C, FAN H, MALIK J, et al. SlowFast networks for video recognition[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE,2019:6201-6210.
[3] TRAN D,WANG H,TORRESANI L,et al. A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6450-6459.
[4] 陆中秋, 侯振杰, 陈宸, 等. 基于深度图像与骨骼数据的行为识别[J]. 计算机应用,2016,36(11):2979-2984,2992.(LU Z Q, HOU Z J,CHEN C,et al. Action recognition based on depth images and skeleton data[J]. Journal of Computer Applications, 2016,36(11):2979-2984,2992.)
[5] 许艳, 侯振杰, 梁久祯, 等. 深度图像与骨骼数据的多特征融合人体行为识别[J]. 小型微型计算机系统,2018,39(8):1865-1870. (XU Y,HOU Z J,LIANG J Z,et al. Human action recognition with multi-feature fusion by depth image and skeleton data[J]. Journal of Chinese Computer Systems,2018,39(8):1865-1870.)
[6] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:568-576.
[7] JOHANSSON G. Visual perception of biological motion and a model for its analysis[J]. Perception and Psychophysics,1973,14(2):201-211.
[8] REN B,LIU M,DING R,et al. A survey on 3D skeleton-based action recognition using learning method[EB/OL].[2020-02-14]. https://arxiv.org/pdf/2002.05907.pdf.
[9] YAN S,XIONG Y,LIN D. Spatial temporal graph convolutional networks for skeleton-based action recognition[EB/OL].[2019-01-25]. https://arxiv.org/pdf/1801.07455.pdf.
[10] SHI L,ZHANG Y,CHENG J,et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:12018-12027.
[11] LI M, CHEN S, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:3590-3698.
[12] SONG Y, ZHANG Z, WANG L. Richly activated graph convolutional network for action recognition with incomplete skeletons[C]//Proceedings of the 2019 IEEE International Conference on Image Processing. Piscataway:IEEE,2019:1-5.
[13] KIPF T N,WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL].[2019-02-22]. https://arxiv.org/pdf/1609.02907.pdf.
[14] HU J,ZHENG W,LAI J,et al. Jointly learning heterogeneous features for RGB-D activity recognition[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:5344-5352.
[15] WANG L,HUYNH DU Q,KONIUSZ PIOTR. A comparative review of recent Kinect-based action recognition algorithms[J]. IEEE Transactions on Image Processing,2019,29:15-28.
[16] 盖赟, 荆国栋. 多尺度方法结合卷积神经网络的行为识别[J]. 计算机工程与应用,2019,55(2):100-103.(GE Y,JING G D. Human action recognition based on convolution neural network combined with multi-scale method[J]. Computer Engineering and Applications,2019,55(2):100-103.)
[17] 管珊珊, 张益农. 基于残差时空图卷积网络的3D人体行为识别[J]. 计算机应用与软件,2020,37(3):198-201,250. (GUAN S S,ZHANG Y N. 3D human behavior recognition based on residual spatio-temporal graph convolutioan network[J]. Computer Applications and Software, 2020, 37(3):198-201,250.)
[18] 万晓依. 基于时空结构关系的3D人体行为识别研究[D]. 苏州:苏州大学,2018:1-3.(WAN X Y. Research on 3D human action recognition base on spatio-temporal structure relationship[D]. Suzhou:Soochow University,2018:1-3.)
[19] SHAHROUDY A,LIU J,NG T T,et al. NTU RGB+D:a large scale dataset for 3D human activity analysis[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:1010-1019.
[20] LIU J,SHAHROUDY A,PEREZ M,et al. NTU RGB+D 120:a large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020,42(10):2684-2707.
[21] DUVENAUD D,MACLAURIN D,AGUILERA-IPARRAGUIRRE J,et al. Convolutional networks on graphs for learning molecular fingerprints[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2015:2224-2232.
[22] NIEPERT M,AHMED M,KUTZKOV K. Learning convolutional neural networks for graphs[C]//Proceedings of the 33rd International Conference on Machine Learning. New York:JMLR. org,2016:2014-2023.
[23] BRUNA J,ZAREMBA W,SZLAM A,et al. Spectral networks and locally connected networks on graphs[EB/OL].[2019-05-21]. https://arxiv.org/pdf/1312.6203.pdf.
[24] KE Q,BENNAMOUN M,AN S,et al. A new representation of skeleton sequences for 3D action recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:4570-4579.
[25] SHUMAN D I,NARANG S K.,FROSSARD P,et al. The emerging field of signal processing on graphs:extending highdimensional data analysis to networks and other irregular domains[J]. IEEE Signal Processing Magazine,2013,30(3):83-98.
[26] PASZKE A, GROSS S, CHINTALA S, et al. Automatic differentiation in PyTorch[EB/OL].[2019-10-29]. https://openreview.net/pdf?id=BJJsrmfCZ.
[27] HU J,ZHENG W,MA L,et al. Early action prediction by soft regression[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(11):2568-2583.
[28] HU J,ZHENG W,LAI J,et al. Jointly learning heterogeneous features for RGB-D activity recognition[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:5344-5352.
[29] LIU J,SHAHROUDY A,XU D,et al. Spatio-temporal LSTM with trust gates for 3D human action recognition[C]//Proceedings of the 2016 European Conference on Computer Vision,LNCS 9907. Cham:Springer,2016:816-833.
[30] LIU J,SHAHROUDY A,XU D,et al. Skeleton-based action recognition using spatio-temporal LSTM network with trust gates[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(12):3007-3021.
[31] LIU J,WANG G,HU P,et al. Global context-aware attention LSTM networks for 3D action recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:3671-3680.
[32] LIU J,SHAHROUDY A,WANG G,et al. Skeleton-based online action prediction using scale selection network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(6):1453-1467.
[33] LIU M,LIU H,CHEN C. Enhanced skeleton visualization for view invariant human action recognition[J]. Pattern Recognition, 2017,68:346-362.
[34] LIU J,WANG G,DUAN L,et al. Skeleton-based human action recognition with global context-aware attention LSTM networks[J]. IEEE Transactions on Image Processing,2018,27(4):1586-1599.
[35] KE Q, BENNAMOUN M, AN S, et al. Learning clip representations for skeleton-based 3D action recognition[J]. IEEE Transactions on Image Processing,2018,27(6):2842-2855.
[36] LIU M,YUAN J. Recognizing human actions as the evolution of pose estimation maps[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:1159-1168.
[37] CEATANO C,SENA J,BRÉMOND F,et al. SkeleMotion:a new representation of skeleton joint sequences based on motion information for 3D action recognition[C]//Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance. Piscataway:IEEE,2019:1-8.
[38] YANG Z,LI Y,YANG J,et al. Action recognition with spatiotemporal visual attention on skeleton image sequences[J]. IEEE Transactions on Circuits and Systems for Video Technology,2019, 29(8):2405-2415.
[39] VEMULAPALLI R,ARRATE F,CHELLAPPA R. Human action recognition by representing 3D skeletons as points in a lie group[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:588-595.
[40] ZHENG W,LI L,ZHANG Z,et al. Relational network for skeleton-based action recognition[C]//Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. Piscataway:IEEE,2019:826-831.
[41] LI S,LI W,COOK C,et al. Independently Recurrent Neural Network (IndRNN):building a longer and deeper RNN[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:5457-5466.
[42] LI C,ZHONG Q,XIE D,et al. Skeleton-based action recognition with convolutional neural networks[C]//Proceedings of the 2017 IEEE International Conference on Multimedia and Expo Workshops. Piscataway:IEEE,2017:597-600.
[43] LI B,DAI Y,CHENG X,et al. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN[C]//Proceedings of the 2017 IEEE International Conference on Multimedia and Expo Workshops. Piscataway:IEEE,2017:601-604.
[44] TANG Y,TIAN Y,LU J,et al. Deep progressive reinforcement learning for skeleton-based action recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:2323-5332.

Human behavior recognition algorithm based on skeletal temporal divergence feature

基于骨骼时序散度特征的人体行为识别算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[2]	Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746.
[3]	Chuanlin PANG, Rui TANG, Ruizhi ZHANG, Chuan LIU, Jia LIU, Shibo YUE. Distributed power allocation algorithm based on graph convolutional network for D2D communication systems [J]. Journal of Computer Applications, 2024, 44(9): 2855-2862.
[4]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[5]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[6]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[7]	Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725.
[8]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[9]	Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI. Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network [J]. Journal of Computer Applications, 2024, 44(9): 2952-2957.
[10]	Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587.
[11]	Chunxue ZHANG, Liqing QIU, Cheng’ai SUN, Caixia JING. Purchase behavior prediction model based on two-stage dynamic interest recognition [J]. Journal of Computer Applications, 2024, 44(8): 2365-2371.
[12]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[13]	Yuqing WANG, Guangli ZHU, Wenjie DUAN, Shuyu LI, Ruotong ZHOU. Sentiment classification model of psychological counseling text based on attention over attention mechanism [J]. Journal of Computer Applications, 2024, 44(8): 2393-2399.
[14]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[15]	Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413.