基于独立循环神经网络与变分自编码网络的视频帧异常检测

doi:10.11772/j.issn.1001-9081.2021122081

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (2): 507-513.DOI: 10.11772/j.issn.1001-9081.2021122081

所属专题：多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于独立循环神经网络与变分自编码网络的视频帧异常检测

贾晴(), 王来花, 王伟胜

曲阜师范大学网络空间安全学院，山东曲阜 273165

收稿日期:2021-12-09 修回日期:2022-04-13 接受日期:2022-05-13 发布日期:2022-06-13 出版日期:2023-02-10
通讯作者: 贾晴
作者简介:王来花（1988—），女，山东聊城人，副教授，博士，主要研究方向：数字图像处理、视频异常检测
王伟胜（1997—），男，山东烟台人，硕士研究生，主要研究方向：视频异常检测、计算机视觉。
基金资助:
国家自然科学基金资助项目(61601261)

Anomaly detection in video via independently recurrent neural network and variational autoencoder network

Qing JIA(), Laihua WANG, Weisheng WANG

School of Cyber Science and Engineering，Qufu Normal University，Qufu Shandong 273165，China

Received:2021-12-09 Revised:2022-04-13 Accepted:2022-05-13 Online:2022-06-13 Published:2023-02-10
Contact: Qing JIA
About author:WANG Laihua， born in 1988， Ph. D.， associate professor. Her research interests include digital image processing， video anomaly detection.
WANG Weisheng， born in 1997， M. S. candidate. His research interests include video anomaly detection， computer vision.
Supported by:
National Natural Science Foundation of China(61601261)

摘要/Abstract

摘要：

为了有效提取连续视频帧间的时间信息，提出一种融合独立循环神经网络（IndRNN）与变分自编码（VAE）网络的预测网络IndRNN-VAE。首先，利用VAE网络提取视频帧的空间信息，并通过线性变换得到视频帧的潜在特征；然后，将潜在特征作为IndRNN的输入以得到视频帧序列的时间信息；最后，通过残差块将获得的潜在变量与时间信息进行融合并输入到解码网络中来生成预测帧。通过在UCSD Ped1、UCSD Ped2、Avenue公开数据集上进行测试，实验结果表明，与现有的异常检测方法相比，基于IndRNN-VAE的方法性能得到了显著提升，曲线下面积（AUC）值分别达到了84.3%、96.2%和86.6%，错误率（EER）值分别达到了22.7%、8.8%和19.0%，平均异常得分的差值分别达到了0.263、0.497和0.293，且运行速度达到了每秒28帧。

关键词: 视频异常检测, 视频监控, 变分自编码器, 独立循环神经网络, 特征提取

Abstract:

To effectively extract the temporal information between consecutive video frames， a prediction network IndRNN-VAE （Independently Recurrent Neural Network-Variational AutoEncoder） that fuses Independently Recurrent Neural Network （IndRNN） and Variational AutoEncoder （VAE） network was proposed. Firstly， the spatial information of video frames was extracted through VAE network， and the latent features of video frames were obtained by a linear transformation. Secondly， the latent features were used as the input of IndRNN to obtain the temporal information of the sequence of video frames. Finally， the obtained latent features and temporal information were fused through residual block and input to the decoding network to generate the prediction frame. By testing on UCSD Ped1， UCSD Ped2 and Avenue public datasets， experimental results show that compared with the existing anomaly detection methods， the method based on IndRNN-VAE has the performance significantly improved， and has the Area Under Curve （AUC） values reached 84.3%， 96.2%， and 86.6% respectively， the Equal Error Rate （EER） values reached 22.7%， 8.8%， and 19.0% respectively， the difference values in the mean anomaly scores reached 0.263， 0.497， and 0.293 respectively. Besides， the running speed of this method reaches 28 FPS （Frames Per Socond）.

Key words: video anomaly detection, video surveillance, Variational AutoEncoder (VAE), Independently Recurrent Neural Network (IndRNN), feature extraction

中图分类号:

TP391.41

贾晴, 王来花, 王伟胜. 基于独立循环神经网络与变分自编码网络的视频帧异常检测[J]. 计算机应用, 2023, 43(2): 507-513.

Qing JIA, Laihua WANG, Weisheng WANG. Anomaly detection in video via independently recurrent neural network and variational autoencoder network[J]. Journal of Computer Applications, 2023, 43(2): 507-513.

图/表 14

图1 视频异常检测整体流程

Fig. 1 Overall flowchart of video anomaly detection

图2 IndRNN-VAE网络模型

Fig. 2 IndRNN-VAE network model

图3 循环IndRNN

Fig. 3 Cyclically Independently recurrent neural network

图4 模型训练结构示意图

Fig. 4 Schematic diagram of model training structure

图5 不同数据集的异常事件示例

Fig. 5 Examples of abnormal events in different datasets

表1 相关异常检测方法的AUC值和EER值对比 ( %)

Tab. 1 AUC value and EER value comparison of related abnormal detection methods

方法	类型	Ped1		Ped2		Avenue
方法	类型	AUC	EER	AUC	EER	AUC	EER
Conv-AE^［9］	帧重构	75.0	27.9	85.0	21.7	80.0	23.0
Unmask^［10］	帧重构	68.4	—	82.2	—	80.6	—
FP^［11］	帧预测	83.1	—	95.4	—	84.9	—
AD^［12］	帧预测	83.9	—	96.0	—	86.0	—
GMFC-VAE^［13］	帧重构	94.9	11.3	92.2	12.6	83.4	22.7
R-STAE^［14］	帧重构	—	—	83.0	—	82.0	—
R-VAE^［16］	帧重构	75.0	32.4	91.0	15.5	79.6	27.5
本文方法	帧预测	84.3	22.7	96.2	8.8	86.6	19.0

表2 相关异常检测方法的时间性能对比

Tab. 2 Time performance comparison of related abnormal detection methods

方法	FPS	方法	FPS
Unmask^［10］	20	R-STAE^［14］	14
FP^［11］	25	本文方法	28

表3 不同数据集上的差值ΔS对比

Tab. 3 Difference value ΔS comparison on different datasets

方法	Ped1	Ped2	Avenue
Conv-AE^［9］	0.243	0.384	0.256
FP^［11］	0.259	0.469	0.275
本文方法	0.263	0.497	0.293

图6 Ped1数据集上视频帧的异常分数

Fig. 6 Anomaly scores of video frames on Ped1 dataset

图7 Ped2数据集上视频帧的异常分数

Fig. 7 Anomaly scores of video frames on Ped2 dataset

图8 Avenue数据集上视频帧的异常分数

Fig. 8 Anomaly scores of video frames on Avenue dataset

图9 Ped2数据集上不同层数的AUC值

Fig. 9 AUC values of different layers on Ped2 dataset

表4 网络中不同模块组合的性能 ( %)

Tab. 4 Performance of different module combinations in network

方法	AUC	EER
Base	94.0	12.4
Base+IndRNN	95.6	10.9
Base+IndRNN+GAN	96.2	8.8

表5 网络中不同损失函数组合的性能 ( %)

Tab. 5 Performance of different loss functions combinations in network

损失函数	AUC
梯度损失+多尺度结构相似性损失	93.9
梯度损失+混合损失	95.9
梯度损失+混合损失+全变分损失	96.2

参考文献 21

1	胡正平，张乐，李淑芳，等. 视频监控系统异常目标检测与定位综述［J］. 燕山大学学报， 2019， 43（1）： 1-12. 10.3969/j.issn.1007-791X.2019.01.001
	HU Z P， ZHANG L， LI S F， et al. Review of abnormal behavior detection and location for intelligent video surveillance systems［J］. Journal of Yanshan University， 2019， 43（1）： 1-12. 10.3969/j.issn.1007-791X.2019.01.001
2	郑併斌，范新南，李敏，等. 基于轨迹分段LDA主题模型的视频异常行为检测方法［J］. 计算机应用， 2015， 35（2）：515-518， 565. 10.11772/j.issn.1001-9081.2015.02.0515
	ZHENG B B， FAN X N， LI M， et al. Trajectory segment-based abnormal behavior detection method using LDA model［J］. Journal of Computer Applications， 2015， 35（2）：515-518， 565. 10.11772/j.issn.1001-9081.2015.02.0515
3	DALAL N， TRIGGS B. Histograms of oriented gradients for human detection［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1. Piscataway： IEEE， 2005： 886-893. 10.1109/cvpr.2005.4
4	DALAL N， TRIGGS B， SCHMID C. Human detection using oriented histograms of flow and appearance［C］// Proceedings of the 2006 European Conference on Computer Vision， LNCS 3952. Berlin： Springer， 2006： 428-441.
5	CHAN A B， Modeling VASCONCELOS N.， clustering， and segmenting video with mixtures of dynamic textures［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2008， 30（5）：909-926. 10.1109/tpami.2007.70738
6	MEHRAN R， OYAMA A， SHAH M. Abnormal crowd behavior detection using social force model［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009： 935-942. 10.1109/cvpr.2009.5206641
7	李敏，刘轲，罗惠琼，等. 基于混合高斯模型的异常检测算法改进［J］.计算机应用与软件， 2014， 31（6）： 198-200. 10.3969/j.issn.1000-386x.2014.06.054
	LI M， LIU K， LUO H Q， et al. Anomaly detection algorithm improvement based on Gaussian mixture model［J］. Computer Applications and Software， 2014， 31（6）： 198-200. 10.3969/j.issn.1000-386x.2014.06.054
8	徐涛，田崇阳，刘才华. 基于深度学习的人群异常行为检测综述［J］. 计算机科学， 2021， 48（9）： 125-134. 10.11896/jsjkx.201100015
	XU T， TIAN C Y， LIU C H. Deep learning for abnormal crowd behavior detection： a review［J］. Computer Science， 2021， 48（9）： 125-134. 10.11896/jsjkx.201100015
9	HASAN M， CHOI J， NEUMANN J， et al. Learning temporal regularity in video sequences［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 733-742. 10.1109/cvpr.2016.86
10	IONESCU R， SMEUREANU S， ALEXE B， et al. Unmasking the abnormal events in video［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2914-2922. 10.1109/iccv.2017.315
11	LIU W， LUO W X， LIAN D Z， et al. Future frame prediction for anomaly detection — a new baseline［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6536-6545. 10.1109/cvpr.2018.00684
12	ZHOU J T， ZHANG L， FANG Z W， et al. Attention-driven loss for anomaly detection in video surveillance［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2020， 30（12）： 4639-4647. 10.1109/tcsvt.2019.2962229
13	FAN Y X， WEN G J， LI D R， et al. Video anomaly detection and localization via Gaussian mixture fully convolutional variational autoencoder［J］. Computer Vision and Image Understanding， 2020， 195： No.102920. 10.1016/j.cviu.2020.102920
14	DEEPAK K， CHANDRAKALA S， MOHAN C K. Residual spatiotemporal autoencoder for unsupervised video anomaly detection［J］. Signal， Image and Video Processing， 2021， 15（1）： 215-222. 10.1007/s11760-020-01740-1
15	NAWARATNE R， ALAHAKOON D， DE SILVA D， et al. Spatiotemporal anomaly detection using deep learning for real-time video surveillance［J］. IEEE Transactions on Industrial Informatics， 2020， 16（1）： 393-402. 10.1109/tii.2019.2938527
16	YAN S Y， SMITH J S， LU W J， et al. Abnormal event detection from videos using a two-stream recurrent variational autoencoder［J］. IEEE Transactions on Cognitive and Developmental Systems， 2020， 12（1）： 30-42. 10.1109/tcds.2018.2883368
17	LI S， LI W Q， COOK C， et al. Independently Recurrent Neural Network （IndRNN）： building a longer and deeper RNN［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 5457-5466. 10.1109/cvpr.2018.00572
18	KINGMA D P， WELLING M. Auto-encoding variational Bayes［EB/OL］. （2014-05-01）［2021-11-01］.. 10.1561/2200000056
19	MAKHZANI A， SHLENS J， JAITLY N， et al. Adversarial autoencoders［EB/OL］. （2016-05-25）［2021-11-01］..
20	GOODFELLOW I J， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial nets［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. Cambridge： MIT Press， 2014： 2672-2680.
21	MAHENDRAN A， VEDALDI A. Understanding deep image representations by inverting them［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 5188-5196. 10.1109/cvpr.2015.7299155

[1]	杨鑫, 陈雪妮, 吴春江, 周世杰. 结合变种残差模型和Transformer的城市公路短时交通流预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2947-2951.
[2]	付帅, 郭小英, 白茹意, 闫涛, 陈斌. 改进的CloFormer模型与有序回归相结合的年龄评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2372-2380.
[3]	陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413.
[4]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[5]	龙伍丹, 彭博, 胡节, 申颖, 丁丹妮. 基于加强特征提取的道路病害检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2264-2270.
[6]	刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257.
[7]	吴郅昊, 迟子秋, 肖婷, 王喆. 基于元学习自适应的小样本语音合成[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1629-1635.
[8]	李宗禹, 强思维, 郭晓波, 朱振峰. 重加权的对抗变分自编码器及其在工业因果效应估计中的应用[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1099-1106.
[9]	崔晨辉, 蔺素珍, 李大威, 禄晓飞, 武杰. 基于孪生网络和Transformer的红外弱小目标跟踪方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 563-571.
[10]	范艺扬, 张洋, 曾尚, 曾渝, 付茂栗. 基于分解和频域特征提取的多变量长时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3442-3448.
[11]	赵培, 乔焰, 胡荣耀, 袁新宇, 李敏悦, 张本初. 基于多域特征提取的多变量时间序列异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3419-3426.
[12]	刘涛, 鞠事宏, 高一萌. 基于改进YOLOv8n的无人机视角下小目标检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3603-3609.
[13]	花晓雨, 李冬芬, 付优, 毕可骏, 应时, 王瑞锦. 结合层次图神经网络与长短期记忆的产业链风险评估预警模型[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3223-3231.
[14]	李牧, 杨宇恒, 柯熙政. 基于混合特征提取与跨模态特征预测融合的情感识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 86-93.
[15]	张雨宁, 阿布都克力木·阿布力孜, 梅悌胜, 徐春, 麦尔达娜·买买提热依木, 哈里旦木·阿布都克里木, 侯钰涛. 基于自监督特征提取的骨骼X线影像异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 175-181.

基于独立循环神经网络与变分自编码网络的视频帧异常检测

Anomaly detection in video via independently recurrent neural network and variational autoencoder network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 21

相关文章 15

编辑推荐

Metrics