Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (8): 2236-2240.DOI: 10.11772/j.issn.1001-9081.2020010041

• Artificial intelligence • Previous Articles     Next Articles

Behavior recognition method based on two-stream non-local residual network

ZHOU Yun, CHEN Shurong   

  1. College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
  • Received:2020-01-16 Revised:2020-04-20 Online:2020-08-10 Published:2020-04-28


周云, 陈淑荣   

  1. 上海海事大学 信息工程学院, 上海 201306
  • 通讯作者: 周云(1995-),女,江苏泰州人,硕士研究生,主要研究方向:图像处理、模式识别;
  • 作者简介:陈淑荣(1972-),女,山西稷山人,副教授,博士,主要研究方向:现代通信网络及控制、图像处理、视频分析处理。

Abstract: The traditional Convolutional Neural Network (CNN) can only extract local features for human behaviors and actions, which leads to low recognition accuracy for similar behaviors. To resolve this problem, a two-stream Non-Local Residual Network (NL-ResNet) based behavior recognition method was proposed. First, the RGB (Red-Green-Blue) frame and the dense optical flow graph of the video were extracted, which were used as the inputs of spatial and temporal flow networks, respectively, and a pre-processing method combining corner cropping and multiple scales was used to perform data enhancement. Second, the residual blocks of the residual network were used to extract local appearance features and motion features of the video respectively, then the global information of the video was extracted by the non-local CNN module connected after the residual block, so as to achieve the crossover extraction of local and global features of the network. Finally, the two branch networks were classified more accurately by A-softmax loss function, and the recognition results after weighted fusion were output. The method makes full use of global and local features to improve the representation capability of the model. On UCF101 dataset, NL-ResNet achieves a recognition accuracy of 93.5%, which is 5.5 percentage points higher compared to the original two-stream network. Experimental results show that the proposed model can better extract behavior features, and effectively improve the behavior recognition accuracy.

Key words: behavior recognition, Two-Stream Convolutional neural Network (Two-Stream ConvNet), non-local, feature extraction, A-softmax

摘要: 针对传统卷积神经网络(CNN)对人体行为动作仅能提取局部特征易导致相似行为动作识别准确率不高的问题,提出了一种基于双流非局部残差网络(NL-ResNet)的行为识别方法。首先提取视频的RGB帧和密集光流图,分别作为空间流和时间流网络的输入,并通过角落裁剪和多尺度相结合的预处理方法进行数据增强;其次分别利用残差网络的残差块提取视频的局部表观特征和运动特征,再通过在残差块之后接入的非局部CNN模块提取视频的全局信息,实现网络局部特征和全局特征的交叉提取;最后将两个分支网络分别通过A-softmax损失函数进行更精细的分类,并输出加权融合后的识别结果。该方法能充分利用局部和全局特征提高模型的表征能力。在UCF101数据集上,NL-ResNet取得了93.5%的识别精度,与原始双流网络相比提高了5.5个百分点。实验结果表明,所提模型能更好地提取行为特征,有效提高行为识别的准确率。

关键词: 行为识别, 双流卷积神经网络, 非局部, 特征提取, A-softmax

CLC Number: