Anomaly detection in video via independently recurrent neural network and variational autoencoder network

doi:10.11772/j.issn.1001-9081.2021122081

Abstract

Abstract:

To effectively extract the temporal information between consecutive video frames， a prediction network IndRNN-VAE （Independently Recurrent Neural Network-Variational AutoEncoder） that fuses Independently Recurrent Neural Network （IndRNN） and Variational AutoEncoder （VAE） network was proposed. Firstly， the spatial information of video frames was extracted through VAE network， and the latent features of video frames were obtained by a linear transformation. Secondly， the latent features were used as the input of IndRNN to obtain the temporal information of the sequence of video frames. Finally， the obtained latent features and temporal information were fused through residual block and input to the decoding network to generate the prediction frame. By testing on UCSD Ped1， UCSD Ped2 and Avenue public datasets， experimental results show that compared with the existing anomaly detection methods， the method based on IndRNN-VAE has the performance significantly improved， and has the Area Under Curve （AUC） values reached 84.3%， 96.2%， and 86.6% respectively， the Equal Error Rate （EER） values reached 22.7%， 8.8%， and 19.0% respectively， the difference values in the mean anomaly scores reached 0.263， 0.497， and 0.293 respectively. Besides， the running speed of this method reaches 28 FPS （Frames Per Socond）.

Key words: video anomaly detection, video surveillance, Variational AutoEncoder (VAE), Independently Recurrent Neural Network (IndRNN), feature extraction

摘要：

为了有效提取连续视频帧间的时间信息，提出一种融合独立循环神经网络（IndRNN）与变分自编码（VAE）网络的预测网络IndRNN-VAE。首先，利用VAE网络提取视频帧的空间信息，并通过线性变换得到视频帧的潜在特征；然后，将潜在特征作为IndRNN的输入以得到视频帧序列的时间信息；最后，通过残差块将获得的潜在变量与时间信息进行融合并输入到解码网络中来生成预测帧。通过在UCSD Ped1、UCSD Ped2、Avenue公开数据集上进行测试，实验结果表明，与现有的异常检测方法相比，基于IndRNN-VAE的方法性能得到了显著提升，曲线下面积（AUC）值分别达到了84.3%、96.2%和86.6%，错误率（EER）值分别达到了22.7%、8.8%和19.0%，平均异常得分的差值分别达到了0.263、0.497和0.293，且运行速度达到了每秒28帧。

关键词: 视频异常检测, 视频监控, 变分自编码器, 独立循环神经网络, 特征提取

CLC Number:

TP391.41

Qing JIA, Laihua WANG, Weisheng WANG. Anomaly detection in video via independently recurrent neural network and variational autoencoder network[J]. Journal of Computer Applications, 2023, 43(2): 507-513.

贾晴, 王来花, 王伟胜. 基于独立循环神经网络与变分自编码网络的视频帧异常检测[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 507-513.

Figures/Tables 14

References 21

1	胡正平，张乐，李淑芳，等. 视频监控系统异常目标检测与定位综述［J］. 燕山大学学报， 2019， 43（1）： 1-12. 10.3969/j.issn.1007-791X.2019.01.001
	HU Z P， ZHANG L， LI S F， et al. Review of abnormal behavior detection and location for intelligent video surveillance systems［J］. Journal of Yanshan University， 2019， 43（1）： 1-12. 10.3969/j.issn.1007-791X.2019.01.001
2	郑併斌，范新南，李敏，等. 基于轨迹分段LDA主题模型的视频异常行为检测方法［J］. 计算机应用， 2015， 35（2）：515-518， 565. 10.11772/j.issn.1001-9081.2015.02.0515
	ZHENG B B， FAN X N， LI M， et al. Trajectory segment-based abnormal behavior detection method using LDA model［J］. Journal of Computer Applications， 2015， 35（2）：515-518， 565. 10.11772/j.issn.1001-9081.2015.02.0515
3	DALAL N， TRIGGS B. Histograms of oriented gradients for human detection［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1. Piscataway： IEEE， 2005： 886-893. 10.1109/cvpr.2005.4
4	DALAL N， TRIGGS B， SCHMID C. Human detection using oriented histograms of flow and appearance［C］// Proceedings of the 2006 European Conference on Computer Vision， LNCS 3952. Berlin： Springer， 2006： 428-441.
5	CHAN A B， Modeling VASCONCELOS N.， clustering， and segmenting video with mixtures of dynamic textures［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2008， 30（5）：909-926. 10.1109/tpami.2007.70738
6	MEHRAN R， OYAMA A， SHAH M. Abnormal crowd behavior detection using social force model［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009： 935-942. 10.1109/cvpr.2009.5206641
7	李敏，刘轲，罗惠琼，等. 基于混合高斯模型的异常检测算法改进［J］.计算机应用与软件， 2014， 31（6）： 198-200. 10.3969/j.issn.1000-386x.2014.06.054
	LI M， LIU K， LUO H Q， et al. Anomaly detection algorithm improvement based on Gaussian mixture model［J］. Computer Applications and Software， 2014， 31（6）： 198-200. 10.3969/j.issn.1000-386x.2014.06.054
8	徐涛，田崇阳，刘才华. 基于深度学习的人群异常行为检测综述［J］. 计算机科学， 2021， 48（9）： 125-134. 10.11896/jsjkx.201100015
	XU T， TIAN C Y， LIU C H. Deep learning for abnormal crowd behavior detection： a review［J］. Computer Science， 2021， 48（9）： 125-134. 10.11896/jsjkx.201100015
9	HASAN M， CHOI J， NEUMANN J， et al. Learning temporal regularity in video sequences［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 733-742. 10.1109/cvpr.2016.86
10	IONESCU R， SMEUREANU S， ALEXE B， et al. Unmasking the abnormal events in video［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2914-2922. 10.1109/iccv.2017.315
11	LIU W， LUO W X， LIAN D Z， et al. Future frame prediction for anomaly detection — a new baseline［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6536-6545. 10.1109/cvpr.2018.00684
12	ZHOU J T， ZHANG L， FANG Z W， et al. Attention-driven loss for anomaly detection in video surveillance［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2020， 30（12）： 4639-4647. 10.1109/tcsvt.2019.2962229
13	FAN Y X， WEN G J， LI D R， et al. Video anomaly detection and localization via Gaussian mixture fully convolutional variational autoencoder［J］. Computer Vision and Image Understanding， 2020， 195： No.102920. 10.1016/j.cviu.2020.102920
14	DEEPAK K， CHANDRAKALA S， MOHAN C K. Residual spatiotemporal autoencoder for unsupervised video anomaly detection［J］. Signal， Image and Video Processing， 2021， 15（1）： 215-222. 10.1007/s11760-020-01740-1
15	NAWARATNE R， ALAHAKOON D， DE SILVA D， et al. Spatiotemporal anomaly detection using deep learning for real-time video surveillance［J］. IEEE Transactions on Industrial Informatics， 2020， 16（1）： 393-402. 10.1109/tii.2019.2938527
16	YAN S Y， SMITH J S， LU W J， et al. Abnormal event detection from videos using a two-stream recurrent variational autoencoder［J］. IEEE Transactions on Cognitive and Developmental Systems， 2020， 12（1）： 30-42. 10.1109/tcds.2018.2883368
17	LI S， LI W Q， COOK C， et al. Independently Recurrent Neural Network （IndRNN）： building a longer and deeper RNN［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 5457-5466. 10.1109/cvpr.2018.00572
18	KINGMA D P， WELLING M. Auto-encoding variational Bayes［EB/OL］. （2014-05-01）［2021-11-01］.. 10.1561/2200000056
19	MAKHZANI A， SHLENS J， JAITLY N， et al. Adversarial autoencoders［EB/OL］. （2016-05-25）［2021-11-01］..
20	GOODFELLOW I J， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial nets［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. Cambridge： MIT Press， 2014： 2672-2680.
21	MAHENDRAN A， VEDALDI A. Understanding deep image representations by inverting them［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 5188-5196. 10.1109/cvpr.2015.7299155

方法	类型	Ped1		Ped2		Avenue
方法	类型	AUC	EER	AUC	EER	AUC	EER
Conv-AE^［9］	帧重构	75.0	27.9	85.0	21.7	80.0	23.0
Unmask^［10］	帧重构	68.4	—	82.2	—	80.6	—
FP^［11］	帧预测	83.1	—	95.4	—	84.9	—
AD^［12］	帧预测	83.9	—	96.0	—	86.0	—
GMFC-VAE^［13］	帧重构	94.9	11.3	92.2	12.6	83.4	22.7
R-STAE^［14］	帧重构	—	—	83.0	—	82.0	—
R-VAE^［16］	帧重构	75.0	32.4	91.0	15.5	79.6	27.5
本文方法	帧预测	84.3	22.7	96.2	8.8	86.6	19.0

方法	类型	Ped1		Ped2		Avenue
方法	类型	AUC	EER	AUC	EER	AUC	EER
Conv-AE^［9］	帧重构	75.0	27.9	85.0	21.7	80.0	23.0
Unmask^［10］	帧重构	68.4	—	82.2	—	80.6	—
FP^［11］	帧预测	83.1	—	95.4	—	84.9	—
AD^［12］	帧预测	83.9	—	96.0	—	86.0	—
GMFC-VAE^［13］	帧重构	94.9	11.3	92.2	12.6	83.4	22.7
R-STAE^［14］	帧重构	—	—	83.0	—	82.0	—
R-VAE^［16］	帧重构	75.0	32.4	91.0	15.5	79.6	27.5
本文方法	帧预测	84.3	22.7	96.2	8.8	86.6	19.0

方法	Ped1	Ped2	Avenue
Conv-AE^［9］	0.243	0.384	0.256
FP^［11］	0.259	0.469	0.275
本文方法	0.263	0.497	0.293

方法	Ped1	Ped2	Avenue
Conv-AE^［9］	0.243	0.384	0.256
FP^［11］	0.259	0.469	0.275
本文方法	0.263	0.497	0.293

方法	AUC	EER
Base	94.0	12.4
Base+IndRNN	95.6	10.9
Base+IndRNN+GAN	96.2	8.8