基于低秩行为信息和多尺度卷积神经网络的人体行为识别方法

doi:10.11772/j.issn.1001-9081.2020060958

计算机应用 ›› 2021, Vol. 41 ›› Issue (3): 721-726.DOI: 10.11772/j.issn.1001-9081.2020060958

所属专题：人工智能

基于低秩行为信息和多尺度卷积神经网络的人体行为识别方法

蒋丽, 黄仕建, 严文娟

长江师范学院电子信息工程学院, 重庆 408100

收稿日期:2020-07-06 修回日期:2020-10-11 发布日期:2021-01-15 出版日期:2021-03-10
通讯作者: 蒋丽
作者简介:蒋丽(1990-),女,重庆合川人,助教,硕士,主要研究方向:计算机视觉、智能信息处理;黄仕建(1983-),男,四川仁寿人,副教授,博士,主要研究方向:计算机视觉、机器学习;严文娟(1976-),女,重庆人,副教授,硕士,主要研究方向:机器学习、模式识别。
基金资助:
重庆市教委科学技术研究计划项目（KJQN202001421）。

Human action recognition method based on low-rank action information and multi-scale convolutional neural network

JIANG Li, HUANG Shijian, YAN Wenjuan

School of Electronic Information Engineering, Yangtze Normal University, Chongqing 408100, China

Received:2020-07-06 Revised:2020-10-11 Online:2021-01-15 Published:2021-03-10
Supported by:
This work is partially supported by the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJQN202001421).

摘要/Abstract

摘要： 针对人体行为识别中传统行为信息获取方法需要繁琐步骤和各类假设的问题，结合卷积神经网络（CNN）在图像视频处理中的优越性能，提出了一种基于低秩行为信息（LAI）和多尺度卷积神经网络（MCNN）的人体行为识别方法。首先，对行为视频进行分段，并分别对每个视频段进行低秩学习以提取到相应的LAI，然后在时间轴上对这些LAI进行连接以获取整个视频的LAI，进而有效捕获视频中的行为信息，避免了繁琐的提取步骤和各类假设。其次，针对LAI的特点，设计了MCNN模型。该模型通过多尺度卷积核获取不同感受野下的LAI行为特征，并合理设计各卷积层、池化层及全连接层来进一步提炼特征并最终输出行为类别。将所提出的方法在KTH和HMDB51两个基准数据库上进行性能验证，同时设计和进行了三组对比实验。实验结果表明，所提方法在两个数据库上分别取得了97.33%和72.05%的识别率，与双重变换（TFT）方法和深时间嵌入网络（DTEN）方法相比，识别率分别至少提高了0.67和1.15个百分点。所提方法能进一步促进行为识别技术在安防、人机交互等领域的广泛应用。

关键词: 行为识别, 低秩学习, 行为信息, 多尺度, 卷积神经网络

Abstract: In view of the problem that traditional methods of action information acquisition in human action recognition need cumbersome steps and various assumptions, and considering the superior performance of Convolutional Neural Network (CNN) in image and video processing, a human action recognition method based on Low-rank Action Information (LAI) and Multi-scale Convolutional Neural Network (MCNN) was proposed. Firstly, the action video was divided into several segments, and the LAI of each segment was extracted by the low-rank learning of this segment, then the LAI of all segments was connected together on the time axis to obtain the LAI of the whole video, which effectively captured the action information in the video, so as to avoid cumbersome extraction steps and various assumptions. Secondly, according to the characteristics of LAI, an MCNN model was designed. In the model, the multi-scale convolution kernels were used to obtain the action characteristics of LAI under different receptive fields, and the reasonable design of each convolution layer, pooling layer and fully connected layer were utilized to further refine the characteristics and finally output the action categories. The performance of the proposed method was verified on two benchmark databases KTH and HMDB51, and three groups of comparison experiments were designed and carried out. Experimental results show that the recognition rates of the proposed method are 97.33% and 72.05% respectively on the two databases, which are at least increased by 0.67 and 1.15 percentage points respectively compared with those of the methods of Two-Fold Transformation (TFT) and Deep Temporal Embedding Network (DTEN). The proposed method can further promote the wide application of action recognition technology in security, human-computer interaction and other fields.

Key words: action recognition, low-rank learning, action information, multi-scale, Convolutional Neural Network (CNN)

中图分类号:

TP391.4

蒋丽, 黄仕建, 严文娟. 基于低秩行为信息和多尺度卷积神经网络的人体行为识别方法[J]. 计算机应用, 2021, 41(3): 721-726.

JIANG Li, HUANG Shijian, YAN Wenjuan. Human action recognition method based on low-rank action information and multi-scale convolutional neural network[J]. Journal of Computer Applications, 2021, 41(3): 721-726.

参考文献

[1] GU Y,YE X,SHENG W,et al. Multiple stream deep learning model for human action recognition[J]. Image and Vision Computing,2020,93:No. 103818.
[2] 郭明祥, 宋全军, 徐湛楠, 等. 基于三维残差稠密网络的人体行为识别算法[J]. 计算机应用,2019,39(12):3482-3489.(GUO M X,SONG Q J,XU Z N,et al. Human behavior recognition algorithm based on three-dimensional residual dense network[J]. Journal of Computer Applications,2019,39(12):3482-3489.)
[3] 杨锋, 许玉, 尹梦晓, 等. 基于深度学习的行人重识别综述[J]. 计算机应用,2020,40(5):1243-1252.(YANG F,XU Y,YIN M X,et al. Review on deep learning-based pedestrian re-identification[J]. Journal of Computer Applications,2020,40(5):1243-1252.)
[4] HAN H,LI X J. Human action recognition with sparse geometric features[J]. The Imaging Science Journal,2015,63(1):45-53.
[5] HOSHINO S, NⅡMURA K. Optical flow for real-time human detection and action recognition based on CNN classifiers[J]. Journal of Advanced Computational Intelligence and Intelligent Informatics,2019,23(4):735-742.
[6] TRIPATHI V,GANGODKAR D,MITTAL A,et al. Robust action recognition framework using segmented block and distance mean histogram of gradients approach[J]. Procedia Computer Science, 2017, 115:493-500.
[7] WANG H,KLÄSER A,SCHMID C,et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision,2013,103(1):60-79.
[8] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:568-576.
[9] WANG L, XIONG Y, WANG Z, et al. Temporal segment networks:towards good practices for deep action recognition[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9912. Cham:Springer,2016:20-36.
[10] DU W,WANG Y,QIAO Y. RPAN:an end-to-end recurrent poseattention network for action recognition in videos[C]//Proceedings of the 2017 IEEE Conference on Computer Visio. Piscataway:IEEE,2017:3745-3754.
[11] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway IEEE,2015:4489-4497.
[12] KIM Y J,LEE D G,LEE S W. Three-stream fusion network for first-person interaction recognition[J]. Pattern Recognition, 2020,103:No. 107279.
[13] 黄仕建. 视频序列中人体行为的低秩表达与识别方法研究[D]. 重庆:重庆大学,2015:39-53.(HUANG S J. Research on low-rank presentation and recognition of human actions in video sequences[D]. Chongqing:Chongqing University, 2015:39-53.)
[14] SCHULDT C, LAPTEV I, CAPUTO B. Recognizing human actions:a local SVM approach[C]//Proceedings of the 17th International Conference on Pattern Recognition. Piscataway:IEEE,2004:32-36.
[15] KUEHNE H,JHUANG H,GARROTE E,et al. HMDB:a large video database for human motion recognition[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision. Piscataway:IEEE,2011:2556-2563.
[16] GUO Z,WANG X,WANG B,et al. A novel 3D gradient LBP descriptor for action recognition[J]. IEICE Transactions on Information and Systems,2017,E100-D (6):1388-1392.
[17] NAZIR S,YOUSAF M H,VELASTIN S A. Evaluating a bag-ofvisual features approach using spatio-temporal features for action recognition[J]. Computers and Electrical Engineering,2018,72:660-669.
[18] HUAN R,XIE C,GUO F,et al. Human action recognition based on HOIRM feature fusion and AP clustering BOW[J]. PLoS One, 2019,14(7):No. 0219910.
[19] KAPOOR R,MISHRA O,TRIPATHI M M,et al. Human action recognition using descriptor based on selective finite element analysis[J]. Journal of Electrical Engineering,2019,70(6):443-453.
[20] JAOUEDI N,BOUJNAH N,BOUHLEL M S. A new hybrid deep learning model for human action recognition[J]. Journal of King Saud University-Computer and Information Sciences,2020,32(4):447-453.
[21] VISHWAKARMA D K. A two-fold transformation model for human action recognition using decisive pose[J]. Cognitive Systems Research,2020,61:1-13.
[22] DUTA I C,UIJLINGS J R R,IONESCU B,et al. Efficient human action recognition using histograms of motion gradients and VLAD with descriptor shape information[J]. Multimedia Tools and Applications,2017,76(21):22445-22472.
[23] ZHANG H, XIN M, WANG S, et al. End-to-end temporal attention extraction and human action recognition[J]. Machine Vision and Applications,2018,29(7):1127-1142.
[24] LIU Z,ZHANG X,SONG L,et al. More efficient and effective tricks for deep action recognition[J]. Cluster Computing,2019, 22(S1):819-826.
[25] WANG D,XIAO H,OU F,et al. Moving human focus inference model for action recognition[C]//Proceedings of the 45th Annual Conference of the IEEE Industrial Electronics Society. Piscataway:IEEE,2019:2554-2559.
[26] MAJD M,SAFABAKHSH R. Correlational convolutional LSTM for human action recognition[J]. Neurocomputing,2020,396:224-229.
[27] KOOHZADI M,CHARKARI N M. A context based deep temporal embedding network in action recognition[J]. Neural Processing Letters,2020,52(1):187-220.

基于低秩行为信息和多尺度卷积神经网络的人体行为识别方法

Human action recognition method based on low-rank action information and multi-scale convolutional neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	戎妍, 刘嘉雯, 李馨蕾. 面向学生课堂情感计算的自适应混合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2919-2930.
[3]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[4]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[5]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[6]	陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413.
[7]	李晨倩, 刘俊. 基于半监督和多尺度级联注意力的超声颈动脉斑块分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2604-2610.
[8]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[9]	唐媛, 陈艳平, 扈应, 黄瑞章, 秦永彬. 基于多尺度混合注意力卷积神经网络的关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2011-2017.
[10]	施赛龙, 方智文. 基于多尺度聚合和共享注意力的注视估计模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2047-2054.
[11]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[12]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[13]	李伟, 张晓蓉, 陈鹏, 李清, 张长青. 基于正态逆伽马分布的多尺度融合人群计数算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2243-2249.
[14]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[15]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.