基于视频深度学习的时空双流人物动作识别模型

doi:10.11772/j.issn.1001-9081.2017071740

计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 895-899.DOI: 10.11772/j.issn.1001-9081.2017071740

• 应用前沿、交叉与综合 • 上一篇下一篇

基于视频深度学习的时空双流人物动作识别模型

杨天明¹, 陈志¹, 岳文静²

1. 南京邮电大学计算机学院, 南京 210023;
2. 南京邮电大学通信与信息工程学院, 南京 210003

收稿日期:2017-07-14 修回日期:2017-09-07 发布日期:2018-03-07 出版日期:2018-03-10
通讯作者: 陈志
作者简介:杨天明(1993-),男,江苏南通人,硕士研究生,主要研究方向:机器学习、视频数据挖掘;陈志(1978-),男,江苏淮安人,教授,硕士生导师,博士,CCF会员,主要研究方向:传感器网络、信息物理融合系统、机器学习、数据挖掘、Agent和多Agent系统;岳文静(1982-),女,山西应县人,副教授,博士,主要研究方向:认知无线电网络、数据挖掘。
基金资助:
国家自然科学基金资助项目（61501253）；江苏省自然科学基金资助项目（BK20151506）；江苏省"六大人才高峰"第十一批高层次人才选拔培养资助项目（XXRJ-009）；江苏省重点研发计划（社会发展）项目（BE2016778）；南京邮电大学科研项目（NY217054）。

Spatio-temporal two-stream human action recognition model based on video deep learning

YANG Tianming¹, CHEN Zhi¹, YUE Wenjing²

1. College of Computer, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210023, China;
2. College of Communication and Information Technology, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210003, China

Received:2017-07-14 Revised:2017-09-07 Online:2018-03-07 Published:2018-03-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61501253), the Basic Research Program of Jiangsu Province (Natural Science Foundation) (BK20151506), the 11th Six Talent Peaks Program of Jiangsu Province (XXRJ-009), the Key Research and Development Program (Social Development) of Jiangsu Province (BE2016778), the Scientific Research Foundation of Nanjing University of Posts and Telecommunications (NY217054).

摘要/Abstract

摘要： 深度学习在人物动作识别方面已取得较好的成效，但当前仍然需要充分利用视频中人物的外形信息和运动信息。为利用视频中的空间信息和时间信息来识别人物行为动作，提出一种时空双流视频人物动作识别模型。该模型首先利用两个卷积神经网络分别抽取视频动作片段空间和时间特征，接着融合这两个卷积神经网络并提取中层时空特征，最后将提取的中层特征输入到3D卷积神经网络来完成视频中人物动作的识别。在数据集UCF101和HMDB51上，进行视频人物动作识别实验。实验结果表明，所提出的基于时空双流的3D卷积神经网络模型能够有效地识别视频人物动作。

关键词: 人物动作识别, 时空模型, 深度学习, 卷积神经网络, 视频挖掘

Abstract: Deep learning has achieved good results in human action recognition, but it still needs to make full use of video human appearance information and motion information. To recognize human actions by using spatial information and temporal information in video, a video human action recognition model based on spatio-temporal two-stream was proposed. Two convolutional neural networks were used to extract spatial and temporal features of video sequences respectively in the proposed model, and then the two neural networks were merged to extract the middle spatio-temporal features, finally the video human action recognition was completed by inputting the extracted features into a 3D convolutional neural network. The video human action recognition experiments were carried out on the data set UCF101 and HMDB51. Experimental results show that the proposed 3D convolutional neural network model based on the spatio-temporal two-stream can effectively recognize the video human actions.

Key words: human action recognition, spatio-temporal model, deep learning, Convolution Neural Network (CNN), video mining

中图分类号:

TP391

杨天明, 陈志, 岳文静. 基于视频深度学习的时空双流人物动作识别模型[J]. 计算机应用, 2018, 38(3): 895-899.

YANG Tianming, CHEN Zhi, YUE Wenjing. Spatio-temporal two-stream human action recognition model based on video deep learning[J]. Journal of Computer Applications, 2018, 38(3): 895-899.

参考文献

[1] 唐宋, 陈利娟, 陈志贤, 等. 基于目标域局部近邻几何信息的域自适应图像分类方法[J]. 计算机应用, 2017, 37(4):1164-1168.(TANG S, CHEN L J, CHEN Z X, et al. Domain adaptation image classification based on target local-neighbor geometrical information[J]. Journal of Computer Applications, 2017, 37(4):1164-1168.)
[2] XIONG H, YU W, YANG X, et al. Learning the conformal transformation kernel for image recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(1):149-163.
[3] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:1-9.
[4] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet:a unified embedding for face recognition and clustering[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:815-823.
[5] TOMPSON J, GOROSHIN R, JAIN A, et al. Efficient object localization using convolutional networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:648-656.
[6] ZHANG J, HAN Y, TANG J, et al. Semi-supervised image-to-video adaptation for video action recognition[J]. IEEE Transactions on Cybernetics, 2016, 47(4):960-973.
[7] LIU L, SHAO L, LI X, et al. Learning spatio-temporal representations for action recognition:a genetic programming approach[J]. IEEE Transactions on Cybernetics, 2016, 46(1):158-170.
[8] HUSAIN F, DELLEN B, TORRAS C. Action recognition based on efficient deep feature learning in the spatio-temporal domain[J]. IEEE Robotics and Automation Letters, 2016, 1(2):984-991.
[9] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[EB/OL].[2017-05-06]. http://www.datascienceassn.org/sites/default/files/Two-Stream%20Convolutional%20Networks%20for%20Action%20Recognition%20in%20Videos.pdf.
[10] JI S, YANG M, YU K, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):221-231.
[11] KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]//CVPR'14:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2014:1725-1732.
[12] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//ICCV'15:Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2015:4489-4497.
[13] DONAHUE J, JIA Y, VINYALS O, et al. DeCAF:a deep convolutional activetion feature for generic visual recognition[EB/OL].[2017-05-09]. https://people.eecs.berkeley.edu/~nzhang/papers/icml14_decaf.pdf.
[14] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[EB/OL].[2017-05-07]. http://xanadu.cs.sjsu.edu/~drtylin/classes/cs267_old/ImageNet%20DNN%20NIPS2012(2).pdf.
[15] KUEHNE H, JHUANG H, GARROTE E, et al. HMDB:a large video database for human motion recognition[C]//ICCV'11:Proceedings of the 2011 International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2011:2556-2563.
[16] DONAHUE J, HENDRICKS L A, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2015:2625-2634.
[17] SUN L, JIA K, YEUNG D Y, et al. Human action recognition using factorized spatio-temporal convolutional networks[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2015:4597-4605.
[18] NAHA S, WANG Y. Beyond verbs:understanding actions in videos with text[C]//Proceedings of the 201623rd International Conference on Pattern Recognition. Piscataway, NJ:IEEE, 2016:1833-1838.
[19] HU R, XU H, ROHRBACH M, et al. Natural language object retrieval[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:4555-4564.

基于视频深度学习的时空双流人物动作识别模型

Spatio-temporal two-stream human action recognition model based on video deep learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[2]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[3]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[4]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[5]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[6]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[7]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[8]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[9]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[10]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[11]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[12]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[13]	吴筝, 程志友, 汪真天, 汪传建, 王胜, 许辉. 基于深度学习的患者麻醉复苏过程中的头部运动幅度分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2258-2263.
[14]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[15]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.