[1] YUAN Z, STROUD J C, LU T, et al. Temporal action localization by structured maximal sums[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2017:3215-3223. [2] LIN T, ZHAO X, SHOU Z. Single shot temporal action detection[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York:ACM, 2017:988-996. [3] SHOU Z, WANG D, CHANG S. Action temporal localization in untrimmed videos via multi-stage CNNs[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2016:1049-1058. [4] SHOU Z, CHAN J, ZAREIAN A. CDC:convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2017:1417-1426. [5] XU H, DAS A, SAENKO K. R-C3D:region convolutional 3D network for temporal activity detection[C]//Proceedings of the 2016 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2017:5794-5803. [6] ZHAO Y, XIONG Y, WANG L, et al. Temporal action detection with structured segment networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2017:2933-2942. [7] SCHMIDT M. Graphical model structure learning with l1-regularization[D]. Berkeley:University of British Columbia, 2010:27-32. [8] SAHA S, SINGH G, SAPIENZA M, et al. Deep learning for detecting multiple space-time action tubes in videos[C]//Proceedings of the 2016 British Machine Vision Conference. Guildford, UK:BMVA Press, 2016:No.58. [9] ZOLFAGHARI M, OLIVEIRA G L, SEDAGHAT N, et al. Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection[C]//Proceedings of the 2017 IEEE Conference on International Conference on Computer Vision. Piscataway, NJ:IEEE, 2017:2923-2932. [10] SINGH K K, LEE Y J. Hide-and-Seek:forcing a network to be meticulous for weakly-supervised object and action localization[C]//Proceedings of the 2017 IEEE Conference on International Conference on Computer Vision. Piscataway, NJ:IEEE, 2017:3544-3553. [11] BAGAUTDINOV T, ALAHI A, FLEURET F, et al. Social scene understanding:end-to-end multi-person action localization and collective activity recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2017:3425-3434. [12] CHEN L, ZHAI M, MORI G. Attending to distinctive moments:weakly-supervised attention models for action localization in video[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Piscataway, NJ:IEEE, 2017:328-336. [13] HOU R, CHEN C, SHAH M. Tube Convolutional Neural Network (T-CNN) for action detection in videos[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2017:5823-5832. [14] WANG L M, XIONG Y J, LIN D H, et al. UntrimmedNets for weakly supervised action recognition and detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2017:6402-6411. [15] KLÄSER A, MARSZAŁEK M, SCHMID C, et al. Human focused action localization in video[C]//Proceedings of the 2010 European Conference on Computer Vision, LNCS 6553. Berlin:Springer, 2010:219-233. [16] WEINZAEPFEL P, HARCHAOUI Z, SCHMID C. Learning to track for spatio-temporal action localization[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2015:3164-3172. [17] SULTANI W, SHAH M. What if we do not have multiple videos of the same action?-video action localization using Web images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2016:1077-1085. [18] LIU C W, WU X, JIA Y. Weakly supervised action recognition and localization using Web images[C]//Proceedings of the 2014 Asian Conference on Computer Vision, LNCS 9007. Berlin:Springer, 2014:642-657. [19] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2015:4489-4497. [20] REDMON J, FARHADI A. YOLOv3:An incremental improvement[J]. arXiv E-print, 2018:arXiv:1804.02767. [21] ZITNICK L, DOLLÁR P. Edge boxes:locating object proposals from edges[C]//Proceedings of the 2014 European Conference on Computer Vision, LNCS 8693. Berlin:Springer, 2014:391-405. [22] CHENG M, ZHANG Z, LIN W, et al. BING:binarized normed gradients for objectness estimation at 300 fps[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2014:3286-3293. [23] WANG H, SCHMID C. Action recognition with improved trajectories[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2013:3551-3558. [24] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv E-print, 2015:arXiv:1409.1556. [25] DO T, ARTIÉRES T. Regularized bundle methods for convex and non-convex risks[J]. The Journal of Machine Learning Research, 2012, 13(1):3539-3583. [26] LAN T, WANG Y, MORI G. Discriminative figure-centric models for joint action localization and recognition[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2011:2003-2010. [27] MOSABBEB E A, CABRAL R, TORRE F de la, et al. Multi-label discriminative weakly-supervised human activity recognition and localization[C]//Proceedings of the 2014 Asian Conference on Computer Vision, LNCS 9007. Berlin:Springer, 2014:241-258. [28] TANG K, SUKTHANKAR R, YAGNIK J, et al. Discriminative segment annotation in weakly labeled video[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2013:2483-2490. [29] SIVA P, RUSSELL C, XIANG T. In defence of negative mining for annotating weakly labelled data[C]//Proceedings of the 2012 European Conference on Computer Vision, LNCS 7574. Berlin:Springer, 2012:594-608. [30] 刘翠微.视频中人的动作分析与理解[D].北京:北京理工大学,2015:77-78. (LIU C W. Analysis and understanding of human action in video[D]. Beijing:Beijing Institute of Technology, 2015:77-78.) |