基于双流非局部残差网络的行为识别方法

doi:10.11772/j.issn.1001-9081.2020010041

计算机应用 ›› 2020, Vol. 40 ›› Issue (8): 2236-2240.DOI: 10.11772/j.issn.1001-9081.2020010041

基于双流非局部残差网络的行为识别方法

周云, 陈淑荣

上海海事大学信息工程学院, 上海 201306

收稿日期:2020-01-16 修回日期:2020-04-20 发布日期:2020-04-28 出版日期:2020-08-10
通讯作者: 周云(1995-),女,江苏泰州人,硕士研究生,主要研究方向:图像处理、模式识别;1160109138@qq.com
作者简介:陈淑荣(1972-),女,山西稷山人,副教授,博士,主要研究方向:现代通信网络及控制、图像处理、视频分析处理。

Behavior recognition method based on two-stream non-local residual network

ZHOU Yun, CHEN Shurong

College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China

Received:2020-01-16 Revised:2020-04-20 Online:2020-04-28 Published:2020-08-10

摘要/Abstract

摘要： 针对传统卷积神经网络（CNN）对人体行为动作仅能提取局部特征易导致相似行为动作识别准确率不高的问题，提出了一种基于双流非局部残差网络（NL-ResNet）的行为识别方法。首先提取视频的RGB帧和密集光流图，分别作为空间流和时间流网络的输入，并通过角落裁剪和多尺度相结合的预处理方法进行数据增强；其次分别利用残差网络的残差块提取视频的局部表观特征和运动特征，再通过在残差块之后接入的非局部CNN模块提取视频的全局信息，实现网络局部特征和全局特征的交叉提取；最后将两个分支网络分别通过A-softmax损失函数进行更精细的分类，并输出加权融合后的识别结果。该方法能充分利用局部和全局特征提高模型的表征能力。在UCF101数据集上，NL-ResNet取得了93.5%的识别精度，与原始双流网络相比提高了5.5个百分点。实验结果表明，所提模型能更好地提取行为特征，有效提高行为识别的准确率。

关键词: 行为识别, 双流卷积神经网络, 非局部, 特征提取, A-softmax

Abstract: The traditional Convolutional Neural Network (CNN) can only extract local features for human behaviors and actions, which leads to low recognition accuracy for similar behaviors. To resolve this problem, a two-stream Non-Local Residual Network (NL-ResNet) based behavior recognition method was proposed. First, the RGB (Red-Green-Blue) frame and the dense optical flow graph of the video were extracted, which were used as the inputs of spatial and temporal flow networks, respectively, and a pre-processing method combining corner cropping and multiple scales was used to perform data enhancement. Second, the residual blocks of the residual network were used to extract local appearance features and motion features of the video respectively, then the global information of the video was extracted by the non-local CNN module connected after the residual block, so as to achieve the crossover extraction of local and global features of the network. Finally, the two branch networks were classified more accurately by A-softmax loss function, and the recognition results after weighted fusion were output. The method makes full use of global and local features to improve the representation capability of the model. On UCF101 dataset, NL-ResNet achieves a recognition accuracy of 93.5%, which is 5.5 percentage points higher compared to the original two-stream network. Experimental results show that the proposed model can better extract behavior features, and effectively improve the behavior recognition accuracy.

Key words: behavior recognition, Two-Stream Convolutional neural Network (Two-Stream ConvNet), non-local, feature extraction, A-softmax

中图分类号:

TP391

周云, 陈淑荣. 基于双流非局部残差网络的行为识别方法[J]. 计算机应用, 2020, 40(8): 2236-2240.

ZHOU Yun, CHEN Shurong. Behavior recognition method based on two-stream non-local residual network[J]. Journal of Computer Applications, 2020, 40(8): 2236-2240.

参考文献

[1] 周志华. 机器学习[M]. 北京:清华大学出版社, 2016:171-173. (ZHOU Z H. Machine Learning[M]. Beijing:Tsinghua University Press, 2016:171-173.)
[2] JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):221-231.
[3] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2014:568-576.
[4] WANG L, XIONG Y, WANG Z, et al. Temporal segment networks:towards good practices for deep action recognition[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9912. Cham:Springer, 2016:22-36.
[5] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[6] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:7794-7803.
[7] LIU W, WEN Y, YU Z, et al. SphereFace:deep hypersphere embedding for face recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6738-6746.
[8] SOOMRO K, ZAMIR A R, SHAH M. UCF101:a dataset of 101 human actions classes from videos in the wild[EB/OL].[2019-12-12].https://arxiv.org/pdf/1212.0402.pdf.
[9] ZHU Y, LAN Z, NEWSAM S, et al. Hidden two-stream convolutional networks for action recognition[C]//Proceedings of the 2018 Asian Conference on Computer Vision, LNCS 11363. Cham:Springer, 2018:363-378.
[10] ZACH C, POCK T, BISCHOF H. A duality based approach for realtime TV-L1 optical flow[C]//Proceedings of the 2007 Joint Pattern Recognition Symposium, LNCS 4713. Berlin:Springer, 2007:214-223.
[11] NG J Y H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets:deep networks for video classification[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:4694-4702.
[12] YANG H, YUAN C, LI B, et al. Asymmetric 3D convolutional neural networks for action recognition[J]. Pattern Recognition, 2019, 85:1-12.
[13] LUVIZON D C, PICARD D, TABIA H, et al. 2D/3D pose estimation and action recognition using multitask deep learning[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:5137-5146
[14] 王萍,庞文浩. 基于视频分段的空时双通道卷积神经网络的行为识别[J]. 计算机应用, 2019, 39(7):2081-2086. (WANG P, PANG W H. Two-stream CNN for action recognition based on video segmentation[J]. Journal of Computer Applications, 2019, 39(7):2081-2086.)
[15] 刘天亮,谯庆伟,万俊伟,等. 融合空间-时间双网络流和视觉注意的人体行为识别[J]. 电子与信息学报, 2018, 40(10):2395-2401. (LIU T L, QIAO Q W, WAN J W, et al. Human action recognition via spatio-temporal dual network flow and visual attention fusion[J]. Journal of Electronics and Information Technology, 2018, 40(10):2395-2401.)
[16] 杨天明,陈志,岳文静. 基于视频深度学习的时空双流人物动作识别模型[J]. 计算机应用, 2018, 38(3):895-899, 915. (YANG T M, CHEN Z, YUE W J. Spatio-temporal two-stream human action recognition model based on video deep learning[J]. Journal of Computer Applications, 2018, 38(3):895-899, 915.)

基于双流非局部残差网络的行为识别方法

Behavior recognition method based on two-stream non-local residual network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	杨鑫, 陈雪妮, 吴春江, 周世杰. 结合变种残差模型和Transformer的城市公路短时交通流预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2947-2951.
[2]	付帅, 郭小英, 白茹意, 闫涛, 陈斌. 改进的CloFormer模型与有序回归相结合的年龄评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2372-2380.
[3]	陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413.
[4]	龙伍丹, 彭博, 胡节, 申颖, 丁丹妮. 基于加强特征提取的道路病害检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2264-2270.
[5]	刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257.
[6]	席治远, 唐超, 童安炀, 王文剑. 基于双路时空网络的驾驶员行为识别[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1511-1519.
[7]	吴郅昊, 迟子秋, 肖婷, 王喆. 基于元学习自适应的小样本语音合成[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1629-1635.
[8]	崔晨辉, 蔺素珍, 李大威, 禄晓飞, 武杰. 基于孪生网络和Transformer的红外弱小目标跟踪方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 563-571.
[9]	刘涛, 鞠事宏, 高一萌. 基于改进YOLOv8n的无人机视角下小目标检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3603-3609.
[10]	范艺扬, 张洋, 曾尚, 曾渝, 付茂栗. 基于分解和频域特征提取的多变量长时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3442-3448.
[11]	赵培, 乔焰, 胡荣耀, 袁新宇, 李敏悦, 张本初. 基于多域特征提取的多变量时间序列异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3419-3426.
[12]	花晓雨, 李冬芬, 付优, 毕可骏, 应时, 王瑞锦. 结合层次图神经网络与长短期记忆的产业链风险评估预警模型[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3223-3231.
[13]	李牧, 杨宇恒, 柯熙政. 基于混合特征提取与跨模态特征预测融合的情感识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 86-93.
[14]	张雨宁, 阿布都克力木·阿布力孜, 梅悌胜, 徐春, 麦尔达娜·买买提热依木, 哈里旦木·阿布都克里木, 侯钰涛. 基于自监督特征提取的骨骼X线影像异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 175-181.
[15]	田悦霖, 黄瑞章, 任丽娜. 融合局部语义特征的学者细粒度信息提取方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2707-2714.