Group activity recognition based on partitioned attention mechanism and interactive position relationship

doi:10.11772/j.issn.1001-9081.2021060904

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (7): 2052-2057.DOI: 10.11772/j.issn.1001-9081.2021060904

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Group activity recognition based on partitioned attention mechanism and interactive position relationship

Bo LIU, Linbo QING, Zhengyong WANG(), Mei LIU, Xue JIANG

College of Electronic and Information Engineering，Sichuan University，Chengdu Sichuan 610065，China

Received:2021-06-03 Revised:2021-09-11 Accepted:2021-09-24 Online:2021-10-18 Published:2022-07-10
Contact: Zhengyong WANG
About author:LIU Bo， born in 1997， M. S. candidate. His research interests include computer vision.
QING Linbo， born in 1982， Ph. D.， associate professor. His research interests include multimedia communication， information system， artificial intelligence， computer vision.
LIU Mei， born in 1996， M. S. Her research interests include computer vision.
JIANG Xue， born in 1998， M. S. candidate. Her research interests include computer vision.
Supported by:
National Natural Science Foundation of China(61871278)

基于分块注意力机制和交互位置关系的群组活动识别

刘博, 卿粼波, 王正勇(), 刘美, 姜雪

四川大学电子信息学院，成都 610065

通讯作者: 王正勇
作者简介:刘博（1997—），男，河南许昌人，硕士研究生，CCF会员，主要研究方向：计算机视觉
卿粼波（1982—），男，四川成都人，副教授，博士生导师，博士，主要研究方向：多媒体通信、信息系统、人工智能、计算机视觉
刘美（1996—），女，江西抚州人，硕士，主要研究方向：计算机视觉
姜雪（1998—），女，山东日照人，硕士研究生，主要研究方向：计算机视觉。
基金资助:
国家自然科学基金资助项目(61871278)

Abstract

Abstract:

Group activity recognition is a challenging task in complex scenes， which involves the interaction and the relative spatial position relationship of a group of people in the scene. The current group activity recognition methods either lack the fine design or do not take full advantage of interactive features among individuals. Therefore， a network framework based on partitioned attention mechanism and interactive position relationship was proposed， which further considered individual limbs semantic features and explored the relationship between interaction feature similarity and behavior consistency among individuals. Firstly， the original video sequences and optical flow image sequences were used as the input of the network， and a partitioned attention feature module was introduced to refine the limb motion features of individuals. Secondly， the spatial position and interactive distance were taken as individual interaction features. Finally， the individual motion features and spatial position relation features were fused as the features of the group scene undirected graph nodes， and Graph Convolutional Network （GCN） was adopted to further capture the activity interaction in the global scene， thereby recognizing the group activity. Experimental results show that this framework achieves 92.8% and 97.7% recognition accuracy on two group activity recognition datasets （CAD （Collective Activity Dataset） and CAE （Collective Activity Extended Dataset））. Compared with Actor Relationship Graph （ARG） and Confidence Energy Recurrent Network （CERN） on CAD dataset， this framework has the recognition accuracy improved by 1.8 percentage points and 5.6 percentage points respectively. At the same time， the results of ablation experiment show that the proposed algorithm achieves better recognition performance.

Key words: group activity recognition, attention mechanism, interactive relationship, video understanding, Graph Convolutional Network (GCN)

摘要：

复杂场景下的群体活动识别是一项具有挑战性的任务，它涉及一组人在场景中的相互作用和相对空间位置关系。针对当前复杂场景下群组行为识别方法缺乏精细化设计以及没有充分利用个体间交互式特征的问题，提出了基于分块注意力机制和交互位置关系的网络框架，进一步考虑个体肢体语义特征，同时挖掘个体间交互特征相似性与行为一致性的关系。首先，采用原始视频序列和光流图像序列作为网络的输入，并引入一种分块注意力模块来细化个体的肢体运动特征；然后，将空间位置和交互式距离作为个体的交互特征；最后，将个体运动特征和空间位置关系特征融合为群体场景无向图的节点特征，并利用图卷积网络（GCN）进一步捕获全局场景下的活动交互，从而识别群体活动。实验结果表明，此框架在两个群组行为识别数据集（CAD和CAE）上分别取得了92.8%和97.7%的识别准确率，在CAD数据集上与成员关系图（ARG）和置信度能量循环网络（CERN）相比识别准确率分别提高了1.8个百分点和5.6个百分点，同时结合消融实验结果验证了所提算法有较高的识别精度。

关键词: 群组活动识别, 注意力机制, 交互关系, 视频理解, 图卷积网络

CLC Number:

TP391.4

Bo LIU, Linbo QING, Zhengyong WANG, Mei LIU, Xue JIANG. Group activity recognition based on partitioned attention mechanism and interactive position relationship[J]. Journal of Computer Applications, 2022, 42(7): 2052-2057.

刘博, 卿粼波, 王正勇, 刘美, 姜雪. 基于分块注意力机制和交互位置关系的群组活动识别[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2052-2057.

Figures/Tables 10

References 21

1	TRAN D， BOURDEV L， FERGUS R， et al. Learning spatiotemporal features with 3D convolutional networks［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 4489-4497. 10.1109/iccv.2015.510
2	WANG L M， LI W， LI W， et al. Appearance-and-relation networks for video classification［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1430-1439. 10.1109/cvpr.2018.00155
3	IBRAHIM M S， MURALIDHARAN S， DENG Z W， et al. A hierarchical deep temporal model for group activity recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 1971-1980. 10.1109/cvpr.2016.217
4	CHOI W， SHAHID K， SAVARESE S. What are they doing？： collective activity classification using spatio-temporal relationship among people［C］// Proceedings of the IEEE 12th International Conference on Computer Vision Workshops. Piscataway： IEEE， 2009： 1282-1289. 10.1109/iccvw.2009.5457461
5	BAGAUTDINOV T， ALAHI A， FLEURET F， et al. Social scene understanding： end-to-end multi-person action localization and collective activity recognition［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3425-3434. 10.1109/cvpr.2017.365
6	WU J C， WANG L M， WANG L， et al. Learning actor relation graphs for group activity recognition［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 9956-9966. 10.1109/cvpr.2019.01020
7	YAN R， TANG J H， SHU X B， et al. Participation-contributed temporal dynamic model for group activity recognition［C］// Proceedings of the 26th ACM International Conference on Multimedia. New York： ACM， 2018： 1292-1300. 10.1145/3240508.3240572
8	杨兴明，范楼苗. 基于区域特征融合网络的群组行为识别［J］. 模式识别与人工智能， 2019， 32（12）： 1116-1121.
	YANG X M， FAN L M. Group activity recognition based on regional feature fusion network［J］. Pattern Recognition and Artificial Intelligence， 2019， 32（12）： 1116-1121.
9	龚玉婷. 基于注意力机制与深度学习网络的群组行为识别方法研究［D］. 青岛：青岛科技大学， 2019：28-29.
	GONG Y T. Group activity recognition algorithm research based on attention mechanism and deep learning network［D］. Qingdao： Qingdao University of Science and Technology， 2019：28-29.
10	SZEGEDY C， VANHOUCKE V， IOFFE S， et al. Rethinking the inception architecture for computer vision［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2818-2826. 10.1109/cvpr.2016.308
11	HE K M， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
12	LU L H， DI H J， LU Y， et al. Spatio-temporal attention mechanisms based model for collective activity recognition［J］. Signal Processing： Image Communication， 2019， 74： 162-174. 10.1016/j.image.2019.02.012
13	KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks［EB/OL］. （2017-02-22）［2020-11-16］..
14	QI M S， QIN J， LI A N， et al. StagNet： an attentive semantic RNN for group activity recognition［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11214. Cham： Springer， 2018： 104-120.
15	CHOI W， SHAHID K， SAVARESE S. Learning context for collective activity recognition［C］// Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2011： 3273-3280. 10.1109/cvpr.2011.5995707
16	LI X， CHUAH M C. SBGAR： semantics based group activity recognition［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2895-2904. 10.1109/iccv.2017.313
17	SHU T M， TODOROVIC S， ZHU S C. CERN： confidence-energy recurrent network for group activity recognition［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 4255-4263. 10.1109/cvpr.2017.453
18	XU D Z， FU H， WU L F， et al. Group activity recognition by using effective multiple modality relation representation with temporal-spatial attention［J］. IEEE Access， 2020， 8： 65689-65698. 10.1109/access.2020.2979742
19	HU G Y， CUI B， HE Y， et al. Progressive relation learning for group activity recognition［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 977-986. 10.1109/cvpr42600.2020.00106
20	DENG Z W， VAHDAT A， HU H X， et al. Structure inference machines： recurrent neural networks for analyzing relations in group activity recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4772-4781. 10.1109/cvpr.2016.516
21	LI W B， CHANG M C， LYU S W. Who did what at where and when： simultaneous multi-person tracking and activity recognition［EB/OL］. （2018-07-03）［2020-10-09］.. 10.1016/j.cviu.2021.103301

方法	骨干网络	准确率
HDTM^［3］	AlexNet	81.5
SBGAR^［16］	Inception-v3	86.1
CERN^［17］	VGG16	87.2
STAGNet^［14］	VGG16	89.1
ARG^［6］	Inception-v3	91.0
MMRR^［18］	Inception-v3	91.2
PRL^［19］	VGG16	91.4
本文方法	Inception-v3	92.8

方法	骨干网络	准确率
HDTM^［3］	AlexNet	81.5
SBGAR^［16］	Inception-v3	86.1
CERN^［17］	VGG16	87.2
STAGNet^［14］	VGG16	89.1
ARG^［6］	Inception-v3	91.0
MMRR^［18］	Inception-v3	91.2
PRL^［19］	VGG16	91.4
本文方法	Inception-v3	92.8

方法	准确率
RSTV+MRF^［15］	82.0
Structure Inference Machines^［20］	90.2
Hypergraphs Model^［21］	95.1
本文方法	97.7

方法	准确率
RSTV+MRF^［15］	82.0
Structure Inference Machines^［20］	90.2
Hypergraphs Model^［21］	95.1
本文方法	97.7

方法	类别准确率					MCA	MPCA
方法	Crossing	Waiting	Queuing	Walking	Talking	MCA	MPCA
Baseline1	66.76	88.85	100.00	87.36	100.00	88.29	88.594
Baseline2	61.97	88.55	98.90	91.82	100.00	88.50	88.248
Baseline3	74.65	89.31	100.00	90.45	98.90	90.46	90.662
Baseline4	85.92	83.97	100.00	87.73	100.00	91.11	91.524
本文方法	76.06	93.89	100.00	94.09	100.00	92.81	92.808

Group activity recognition based on partitioned attention mechanism and interactive position relationship

基于分块注意力机制和交互位置关系的群组活动识别

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 21

Related Articles 15

Recommended Articles

Metrics

方法	准确率
Baseline1	90.5
Baseline2	93.6
Baseline3	95.7
Baseline4	96.3
本文方法	97.7

[1]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[2]	Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI. Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network [J]. Journal of Computer Applications, 2024, 44(9): 2952-2957.
[3]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[4]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[5]	Chuanlin PANG, Rui TANG, Ruizhi ZHANG, Chuan LIU, Jia LIU, Shibo YUE. Distributed power allocation algorithm based on graph convolutional network for D2D communication systems [J]. Journal of Computer Applications, 2024, 44(9): 2855-2862.
[6]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[7]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[8]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[9]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[10]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[11]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[12]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[13]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[14]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[15]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.