Abstract:The current group behavior recognition method do not make full use of the group relational information, so that the group recognition accuracy cannot be effectively improved. Therefore, a deep neural network model based on the hierarchical relational module of Affinity Propagation (AP) algorithm was proposed, named Clustering Relational Network (CRN). First, Convolutional Neural Network (CNN) was used to extract scene features, and the regional feature clustering was used to extract person features in the scene. Second, the hierarchical relational network module of AP was adopted to extract group relational information. Finally, the individual feature sequences and group relational information were fused by Long Short-Term Memory (LSTM) network, and the final group recognition result was obtained. Compared with the Multi-Stream Convolutional Neural Network (MSCNN), CRN has the recognition accuracy improved by 5.39 and 3.33 percentage points on Volleyball dataset and Collective Activity dataset, respectively. Compared with the Confidence-Energy Recurrent Network (CERN), CRN has the recognition accuracy improved by 8.70 and 3.14 percentage points on Volleyball dataset and Collective dataset, respectively. Experimental results show that CRN has higher recognition accuracy in the group behavior recognition tasks.
[1] REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards realtime object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017, 39(6):1137-1149. [2] 吴帅, 徐勇, 赵东宁. 基于深度卷积网络的目标检测综述[J]. 模式识别与人工智能,2018,31(4):335-346.(WU S,XU Y, ZHAO D N. Survey of object detection based on deep convolutional network[J]. Pattern Recognition and Artificial Intelligence,2018, 31(4):335-346.) [3] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE,1998,86(11):2278-2324. [4] 马力, 王永雄. 基于稀疏化双线性卷积神经网络的细粒度图像分类[J]. 模式识别与人工智能,2019,32(4):336-344.(MA L, WANG Y X. Fine-grained visual classification based on sparse bilinear convolutional neural network[J]. Pattern Recognition and Artificial Intelligence,2019,32(4):336-344.) [5] DONAHUE J,HENDRICKS L A,GUADARRAMA S,et al. Longterm recurrent convolutional networks for visual recognition and description[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:2625-2634. [6] IBRAHIM M S,MURALIDHARAM S,DENG Z. A hierarchical deep temporal model for group activity recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:1971-1980. [7] BAGAUTDINOV T,ALAHI A,FLEURET F,et al. Social scene understanding:End-to-end multi-person action localization and collective activity recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:3425-3434. [8] RAMANATHAN V, HUANG J, ABU-EL-HAIJA S, et al. Detecting events and key actors in multi-person videos[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:3043-3053. [9] AMER M R,LEI P,TODOROVIC S. HIRF:Hierarchical random field for collective activity recognition in videos[C]//Proceedings of the 2014 European Conference on Computer Vision,LNCS 8694. Cham:Springer,2014:572-585. [10] LAN T,WANG Y,YANG W,et al. Discriminative latent models for recognizing contextual group activities[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(8):1549-1562. [11] MORI G,SIGAL L,LAN T. Social roles in hierarchical models for human activity recognition[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2012:1354-1361. [12] RAMANATHAN V,YAO B,LI F F. Social role discovery in human events[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2013:2475-2482. [13] CHOI W,SAVARESE S. A unified framework for multi-target tracking and collective activity recognition[C]//Proceedings of the 2012 European Conference on Computer Vision,LNCS 7575. Berlin:Springer,2012:215-230. [14] CHOI W, SHAHID K, SAVARESE S. Learning context for collective activity recognition[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2011:3273-3280. [15] DENG Z, VAHDAT A, HU H, et al. Structure inference machines:recurrent neural networks for analyzing relations in group activity recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:4772-4781. [16] AZAR S M, ATIGH M G, NICKABADI A. A multi-stream convolutional neural network framework for group activity recognition[EB/OL].[2019-12-26]. https://arxiv.org/pdf/1812.10328.pdf. [17] SHU T,TODOROVIC S,ZHU S C. CERN:confidence-energy recurrent network for group activity recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:4255-4263. [18] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778. [19] FARNEBÄCK G. Two-frame motion estimation based on polynomial expansion[C]//Proceedings of the 2003 Scandinavian Conference on Image Analysis,LNCS 2749. Berlin:Springer, 2003:363-370 [20] SZEGEDY C,VANHOUCKE V,IOFFE S,et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:2818-2826. [21] HE K,GKIOXARI G,DOLLÄR P,et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2980-2988. [22] OLAH C. Understanding LSTM networks[EB/OL].[2019-08-27]. https://colah.github.io/posts/2015-08-UnderstandingLSTMs/. [23] FREY B J,DUECK D. Clustering by passing messages between data points[J]. Science,2007,315(5814):972-976. [24] IBRAHIM M S,MORI G. Hierarchical relational networks for group activity recognition and retrieval[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11207. Cham:Springer,2018:742-758. [25] LI X,CHOO M C. SBGAR:semantics based group activity recognition[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2895-2904. [26] HAJIMIRSADEGHI H,YAN W,VAHDAT A,et al. Visual recognition by counting instances:a multi-instance cardinality potential kernel[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:2596-2605.