[1] ZHU F, SHA L, XIE J, et al. From handcrafted to learned representations for human action recognition:a survey[J]. Image and Vision Computing, 2016, 55:42-52. [2] GUANGLE Y, TAO L, JIANDAN Z. A review of convolutional-neural-network-based action recognition[J]. Pattern Recognition Letters, 2019, 118:14-22. [3] 李瑞峰,王亮亮,王珂.人体动作行为识别研究综述[J].模式识别与人工智能,2014,27(1):35-48.(LI R F, WANG L L, WANG K. A survey of human body action recognition[J]. Pattern Recognition and Artificial Intelligence, 2014, 27(1):35-48. [4] SCOVANNER P, ALI S, SHAH M. A 3-dimensional SIFT descriptor and its application to action recognition[C]//Proceedings of the 15th ACM International Conference on Multimedia. New York:ACM, 2007:357-360. [5] LAPTEV I, MARSZALEK M, SCHMID C, et al. Learning realistic human actions from movies[C]//Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2008:1-8. [6] WANG H, KLÄSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013, 103(1):60-79. [7] BOBICK A F, DAVIS J W. The recognition of human movement using temporal templates[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(3):257-267. [8] 李英杰,尹怡欣,邓飞.一种有效的行为识别视频特征[J].计算机应用,2011,31(2):406-409,419.(LI Y J, YIN Y X, DENG F. Effective video feature for action recognition[J]. Journal of Computer Applications, 2011, 31(2):406-409, 419.) [9] LIU J, KUIPERS B, SAVARESE S. Recognizing human actions by attributes[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2011:3337-3344. [10] WANG H, SCHMID C. Action recognition with improved trajectories[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2013:3551-3558. [11] LAN Z, LIN M, LI X, et al. Beyond Gaussian pyramid:multi-skip feature stacking for action recognition[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:204-212. [12] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. New York:Curran Associates Inc., 2012:1097-1105. [13] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-03-31]. https://arxiv.org/pdf/1409.1556.pdf. [14] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:1-9. [15] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778. [16] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:2261-2269. [17] JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):221-231. [18] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2015:4489-4497. [19] TRAN D, RAY J, SHOU Z, et al. ConvNet architecture search for spatiotemporal feature learning[J]. Computer Vision and Pattern Recognition, 2017, 17(8):65-77. [20] HARA K, KATAOKA H, SATOH Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:6546-6555. [21] TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:6450-6459. [22] CHEN Y, KALANTIDIS Y, LI J, et al. Multi-fiber networks for video recognition[C]//Proceedings of the 2018 European Conference on Computer Vision, LNCS 11205. Berlin:Springer, 2018:364-380. [23] YANG H, YUAN C, LI B, et al. Asymmetric 3D convolutional neural networks for action recognition[J]. Pattern Recognition, 2019, 85:1-12. [24] DIBA A, FAYYAZ M, SHARMA V, et al. Spatio-temporal channel correlation networks for action classification[C]//Proceedings of the 2018 European Conference on Computer Vision, LNCS 11208. Cham:Springer:284-299. [25] HUSSEIN N, GAVVES E, SMEULDERS A W M, et al. Timeception for complex action recognition[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:254-263. [26] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 2014 International Conference on Neural Information Processing Systems. New York:Curran Associates Inc., 2014:568-576. [27] WANG L, XIONG Y, WANG Z, et al. Towards good practices for very deep two-stream convents[EB/OL].[2019-03-31]. https://arxiv.org/pdf/1507.02159.pdf. [28] WANG L, XIONG Y, WANG Z, et al. Temporal segment networks:towards good practices for deep action recognition[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9912. Cham:Springer, 2016:20-36. [29] FEICHTENHOFER C, PINZ A, WILDES R P. Spatiotemporal residual networks for video action recognition[C]//Proceedings of the 2016 International Conference on Neural Information Processing Systems. New York:Curran Associates Inc., 2016:3468-3476. [30] FEICHTENHOFER C, PINZ A, ZISSERMAN A. Convolutional two-stream network fusion for video action recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:1933-1941. [31] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:4724-4733. [32] MA S, SIGAL L, SCLAROFF S. Learning activity progression in LSTMs for activity detection and early detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:1942-1950. [33] DONAHUE J, HENDRICKS L A, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:2625-2634. [34] NG J Y H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets:deep networks for video classification[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:4694-4702. [35] WU Z, WANG X, JIANG Y, et al. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York:ACM, 2015:461-470. [36] SCHULDT C, LAPTEV I, CAPUTO B. Recognizing human actions:A local SVM approach[C]//Proceedings of the 17th International Conference on Pattern Recognition. Piscataway:IEEE, 2004:32-36. [37] DOLLAR P, RABAUD V, COTTRELL G, et al. Behavior recognition via sparse spatio-temporal features[C]//Proceedings of the 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. Piscataway:IEEE, 2005:65-72. [38] TAYLOR G W, FERGUS R, LECUN Y, et al. Convolutional learning of spatio-temporal features[C]//Proceedings of the 2010 European Conference on Computer Vision, LNCS 6316. Berlin:Springer, 2010:140-153. |