《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (4): 1300-1309.DOI: 10.11772/j.issn.1001-9081.2024040519

• 多媒体计算与计算机仿真 • 上一篇    下一篇

面向运动前景区域的视频异常检测

潘理虎(), 彭守信, 张睿, 薛之洋, 毛旭珍   

  1. 太原科技大学 计算机科学与技术学院,太原 030024
  • 收稿日期:2024-04-25 修回日期:2024-09-12 接受日期:2024-09-14 发布日期:2025-04-08 出版日期:2025-04-10
  • 通讯作者: 潘理虎
  • 作者简介:彭守信(1998—),男,江西九江人,硕士研究生,主要研究方向:深度学习、视频异常检测
    张睿(1987—),男,山西太原人,副教授,博士,主要研究方向:智能信息处理、自动机器学习
    薛之洋(1999—),男,山西太原人,硕士研究生,主要研究方向:人工智能、目标检测
    毛旭珍(1995—),女,山西吕梁人,硕士研究生,主要研究方向:视频异常检测。
  • 基金资助:
    山西省基础研究计划项目(202203021221145);山西省研究生联合培养示范基地项目(2022JD11)

Video anomaly detection for moving foreground regions

Lihu PAN(), Shouxin PENG, Rui ZHANG, Zhiyang XUE, Xuzhen MAO   

  1. College of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan Shanxi 030024,China
  • Received:2024-04-25 Revised:2024-09-12 Accepted:2024-09-14 Online:2025-04-08 Published:2025-04-10
  • Contact: Lihu PAN
  • About author:PENG Shouxin, born in 1998, M. S. candidate. His research interests include deep learning, video anomaly detection.
    ZHANG Rui, born in 1987, Ph. D., associate professor. His research interests include intelligent information processing, automated machine learning.
    XUE Zhiyang, born in 1999, M. S. candidate. His research interests include artificial intelligence, object detection.
    MAO Xuzhen, born in 1995, M. S. candidate. Her research interests include video anomaly detection.
  • Supported by:
    Shanxi Provincial Basic Research Program(202203021221145);Shanxi Province Graduate Joint Cultivation Demonstration Base Project(2022JD11)

摘要:

静态背景信息和运动前景对象的数据分布不平衡通常会引起发生异常的前景区域信息学习不充分问题,进而影响视频异常检测(VAD)的精度。为了解决上述问题,提出一种用于VAD的嵌套U型帧预测生成对抗网络(NUFP-GAN)方法。所提方法使用具有突出视频帧中显著目标能力的嵌套U型帧预测网络架构作为帧预测模块,并在判别阶段设计一个自注意力补丁判别器,应用不同大小的感受野提取视频帧中更重要的外观和运动特征,以提升异常检测的准确性。此外,为保证预测帧和真实帧在高级语义信息上的多尺度特征一致性,引入多尺度一致性损失,以进一步提升方法的异常检测效果。实验结果表明,所提方法在CUHK Avenue、UCSD Ped1、UCSD Ped2和ShanghaiTech数据集上的曲线下面积(AUC)值分别达到了87.6%、85.2%、96.0%和73.3%;与MAMC (Memory-enhanced Appearance-Motion Consistency)方法相比,所提方法在ShanghaiTech数据集上的AUC值提升了1.8个百分点。可见,所提方法能够有效应对VAD中数据分布不平衡带来的挑战。

关键词: 深度学习, 视频异常检测, 生成对抗网络, 未来帧预测, 无监督学习

Abstract:

Imbalance in data distribution between static background information and moving foreground objects often leads to insufficient learning of abnormal foreground region information, thereby affecting the accuracy of Video Anomaly Detection (VAD). To address this issue, a Nested U-shaped Frame Predictive Generative Adversarial Network (NUFP-GAN) was proposed for VAD. In the proposed method, a nested U-shaped frame prediction network architecture, which had the capability to highlight significant targets in video frames, was utilized as the frame prediction module. In the discrimination phase, a self-attention patch discriminator was designed to extract more important appearance and motion features from video frames using receptive fields of different sizes, thereby enhancing the accuracy of anomaly detection. Additionally, to ensure the consistency of multi-scale features of predicted frames and real frames in high-level semantic information, a multi-scale consistency loss was introduced to further improve the method’s anomaly detection performance. Experimental results show that the proposed method achieves the Area Under Curve (AUC) values of 87.6%, 85.2%, 96.0%, and 73.3%, respectively, on CUHK Avenue, UCSD Ped1, UCSD Ped2, and ShanghaiTech datasets; on ShanghaiTech dataset, the AUC value of the proposed method is 1.8 percentage points higher than that of MAMC (Memory-enhanced Appearance-Motion Consistency) method. It can be seen that the proposed method can meet the challenges brought by data distribution imbalance in VAD effectively.

Key words: deep learning, Video Anomaly Detection (VAD), Generative Adversarial Network (GAN), future frame prediction, unsupervised learning

中图分类号: