Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (3): 895-899.DOI: 10.11772/j.issn.1001-9081.2017071740

Previous Articles     Next Articles

Spatio-temporal two-stream human action recognition model based on video deep learning

YANG Tianming1, CHEN Zhi1, YUE Wenjing2   

  1. 1. College of Computer, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210023, China;
    2. College of Communication and Information Technology, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210003, China
  • Received:2017-07-14 Revised:2017-09-07 Online:2018-03-07 Published:2018-03-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61501253), the Basic Research Program of Jiangsu Province (Natural Science Foundation) (BK20151506), the 11th Six Talent Peaks Program of Jiangsu Province (XXRJ-009), the Key Research and Development Program (Social Development) of Jiangsu Province (BE2016778), the Scientific Research Foundation of Nanjing University of Posts and Telecommunications (NY217054).


杨天明1, 陈志1, 岳文静2   

  1. 1. 南京邮电大学 计算机学院, 南京 210023;
    2. 南京邮电大学 通信与信息工程学院, 南京 210003
  • 通讯作者: 陈志
  • 作者简介:杨天明(1993-),男,江苏南通人,硕士研究生,主要研究方向:机器学习、视频数据挖掘;陈志(1978-),男,江苏淮安人,教授,硕士生导师,博士,CCF会员,主要研究方向:传感器网络、信息物理融合系统、机器学习、数据挖掘、Agent和多Agent系统;岳文静(1982-),女,山西应县人,副教授,博士,主要研究方向:认知无线电网络、数据挖掘。
  • 基金资助:

Abstract: Deep learning has achieved good results in human action recognition, but it still needs to make full use of video human appearance information and motion information. To recognize human actions by using spatial information and temporal information in video, a video human action recognition model based on spatio-temporal two-stream was proposed. Two convolutional neural networks were used to extract spatial and temporal features of video sequences respectively in the proposed model, and then the two neural networks were merged to extract the middle spatio-temporal features, finally the video human action recognition was completed by inputting the extracted features into a 3D convolutional neural network. The video human action recognition experiments were carried out on the data set UCF101 and HMDB51. Experimental results show that the proposed 3D convolutional neural network model based on the spatio-temporal two-stream can effectively recognize the video human actions.

Key words: human action recognition, spatio-temporal model, deep learning, Convolution Neural Network (CNN), video mining

摘要: 深度学习在人物动作识别方面已取得较好的成效,但当前仍然需要充分利用视频中人物的外形信息和运动信息。为利用视频中的空间信息和时间信息来识别人物行为动作,提出一种时空双流视频人物动作识别模型。该模型首先利用两个卷积神经网络分别抽取视频动作片段空间和时间特征,接着融合这两个卷积神经网络并提取中层时空特征,最后将提取的中层特征输入到3D卷积神经网络来完成视频中人物动作的识别。在数据集UCF101和HMDB51上,进行视频人物动作识别实验。实验结果表明,所提出的基于时空双流的3D卷积神经网络模型能够有效地识别视频人物动作。

关键词: 人物动作识别, 时空模型, 深度学习, 卷积神经网络, 视频挖掘

CLC Number: