• •    

基于骨架图与混合注意力的视频行人异常检测方法(BigData2023_P00320)

刘禹含1,吉根林2,张红苹1   

  1. 1. 南京师范大学
    2. 南京师范大学 计算机科学与技术学院,南京210023
  • 收稿日期:2023-08-29 修回日期:2023-09-11 发布日期:2023-12-18
  • 通讯作者: 吉根林
  • 基金资助:
    国家自然科学基金

Video Pedestrian Anomaly Detection Method Based on Skeleton Graph and Mixed Attention

  • Received:2023-08-29 Revised:2023-09-11 Online:2023-12-18
  • Contact: JI Genlin
  • Supported by:
    National Natural Science Foundation of China

摘要: 人体骨架曾被广泛应用于行为识别等领域,其作为一种拓扑结构的描述方式对光照变化以及背景噪声具有良好的鲁棒性,因此非常适合研究视频行人异常检测。近些年来许多研究通过时空图卷积网络构建模型进行检测,但这类方法中描述人体骨架连接强弱的方式一般只考虑到直接相连的节点,所关注的运动区域较小且忽略了局部特征,要做到准确检测行人异常事件依然存在很大的困难。因此提出了一种基于骨架图与混合注意力的视频行人异常检测算法 PAD-SGMA,该方法首先扩展骨架点之间的关联,将根节点与未直接相连的节点进行连接,并且对人体骨架图进行划分获取人体骨架局部特征,在图卷积模块中利用静态全局骨架、局部区域骨架和基于注意的邻接矩阵来捕获层次表示。其次,提出新的时空通道混合注意图卷积网络,增加混合注意力模块,关注空间和通道关系,帮助模型增强区分特征且对每个关节进行不同程度的关注。为了验证所提出的模型,本文在大规模的公开标准数据集(ShanghaiTech Campus 数据集)上进行实验,结果表明 PAD-SGMA 与其他方法相比准确率更高。

关键词: 视频异常检测, 深度学习, 人体骨架, 图卷积网络, 注意力

Abstract: Human skeleton has been widely used in the field of behavior recognition, and as a topological structure description method, it has good robustness to light changes and background noise, so it is very suitable for the study of video pedestrian anomaly detection. In recent years, spatiotemporal graph convolutional networks have been used to construct models for detection. However, most of the methods used to describe the strength of human skeleton connection only consider directly connected nodes, focus on small moving areas, and ignore local features. It is still very difficult to accurately detect pedestrian abnormal events. Therefore, a video pedestrian anomaly detection algorithm, PAD-SGMA, based on skeleton graph and mixed attention, is proposed. This method first expands the association between skeleton points, connects the root node with the node that is not directly connected, and divides the human skeleton graph to obtain the local features of the human skeleton. In the graph convolution module, static global skeleton, local region skeleton and attention-based adjacency matrix are used to capture the hierarchical representation. Secondly, a new convolutional network of spatiotemporal channels mixed attention graphs is proposed to increase the attention space and channel relations of the mixed attention module, which helps the model enhance the distinguishing features and give different levels of attention to each joint. In order to verify the proposed model, experiments are conducted on a large-scale open standard dataset (ShanghaiTech Campus dataset), and the results show that PAD-SGMA is more accurate than other methods.

Key words: video anomaly detection, deep learning, human skeleton, graph convolutional network, attention

中图分类号: