[1] REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards realtime object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017, 39(6):1137-1149. [2] 吴帅, 徐勇, 赵东宁. 基于深度卷积网络的目标检测综述[J]. 模式识别与人工智能,2018,31(4):335-346.(WU S,XU Y, ZHAO D N. Survey of object detection based on deep convolutional network[J]. Pattern Recognition and Artificial Intelligence,2018, 31(4):335-346.) [3] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE,1998,86(11):2278-2324. [4] 马力, 王永雄. 基于稀疏化双线性卷积神经网络的细粒度图像分类[J]. 模式识别与人工智能,2019,32(4):336-344.(MA L, WANG Y X. Fine-grained visual classification based on sparse bilinear convolutional neural network[J]. Pattern Recognition and Artificial Intelligence,2019,32(4):336-344.) [5] DONAHUE J,HENDRICKS L A,GUADARRAMA S,et al. Longterm recurrent convolutional networks for visual recognition and description[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:2625-2634. [6] IBRAHIM M S,MURALIDHARAM S,DENG Z. A hierarchical deep temporal model for group activity recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:1971-1980. [7] BAGAUTDINOV T,ALAHI A,FLEURET F,et al. Social scene understanding:End-to-end multi-person action localization and collective activity recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:3425-3434. [8] RAMANATHAN V, HUANG J, ABU-EL-HAIJA S, et al. Detecting events and key actors in multi-person videos[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:3043-3053. [9] AMER M R,LEI P,TODOROVIC S. HIRF:Hierarchical random field for collective activity recognition in videos[C]//Proceedings of the 2014 European Conference on Computer Vision,LNCS 8694. Cham:Springer,2014:572-585. [10] LAN T,WANG Y,YANG W,et al. Discriminative latent models for recognizing contextual group activities[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(8):1549-1562. [11] MORI G,SIGAL L,LAN T. Social roles in hierarchical models for human activity recognition[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2012:1354-1361. [12] RAMANATHAN V,YAO B,LI F F. Social role discovery in human events[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2013:2475-2482. [13] CHOI W,SAVARESE S. A unified framework for multi-target tracking and collective activity recognition[C]//Proceedings of the 2012 European Conference on Computer Vision,LNCS 7575. Berlin:Springer,2012:215-230. [14] CHOI W, SHAHID K, SAVARESE S. Learning context for collective activity recognition[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2011:3273-3280. [15] DENG Z, VAHDAT A, HU H, et al. Structure inference machines:recurrent neural networks for analyzing relations in group activity recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:4772-4781. [16] AZAR S M, ATIGH M G, NICKABADI A. A multi-stream convolutional neural network framework for group activity recognition[EB/OL].[2019-12-26]. https://arxiv.org/pdf/1812.10328.pdf. [17] SHU T,TODOROVIC S,ZHU S C. CERN:confidence-energy recurrent network for group activity recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:4255-4263. [18] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778. [19] FARNEBÄCK G. Two-frame motion estimation based on polynomial expansion[C]//Proceedings of the 2003 Scandinavian Conference on Image Analysis,LNCS 2749. Berlin:Springer, 2003:363-370 [20] SZEGEDY C,VANHOUCKE V,IOFFE S,et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:2818-2826. [21] HE K,GKIOXARI G,DOLLÄR P,et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2980-2988. [22] OLAH C. Understanding LSTM networks[EB/OL].[2019-08-27]. https://colah.github.io/posts/2015-08-UnderstandingLSTMs/. [23] FREY B J,DUECK D. Clustering by passing messages between data points[J]. Science,2007,315(5814):972-976. [24] IBRAHIM M S,MORI G. Hierarchical relational networks for group activity recognition and retrieval[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11207. Cham:Springer,2018:742-758. [25] LI X,CHOO M C. SBGAR:semantics based group activity recognition[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2895-2904. [26] HAJIMIRSADEGHI H,YAN W,VAHDAT A,et al. Visual recognition by counting instances:a multi-instance cardinality potential kernel[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:2596-2605. |