计算机应用 ›› 2021, Vol. 41 ›› Issue (10): 2997-3003.DOI: 10.11772/j.issn.1001-9081.2020121906

所属专题: 多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇    下一篇

基于外观和动作特征双预测模型的视频异常行为检测

李自强1, 王正勇1, 陈洪刚1, 李林怡2, 何小海1   

  1. 1. 四川大学 电子信息学院, 成都 610065;
    2. 中国民航局第二研究所, 成都 610041
  • 收稿日期:2020-12-08 修回日期:2021-04-26 出版日期:2021-10-10 发布日期:2021-07-16
  • 通讯作者: 陈洪刚
  • 作者简介:李自强(1998-),男,江西九江人,硕士研究生,主要研究方向:计算机视觉、深度学习、异常行为检测;王正勇(1969-),女,四川达州人,副教授,博士,主要研究方向:图像处理、模式识别;陈洪刚(1991-),男,四川达州人,助理研究员,博士,主要研究方向:图像处理;李林怡(1996-),女,四川内江人,主要研究方向:智能运行控制;何小海(1964-),男,四川绵阳人,教授,博士,主要研究方向:图像处理、模式识别、图像通信。
  • 基金资助:
    国家自然科学基金资助项目(61871278);四川省科技计划项目(2019YFH0034)。

Video abnormal behavior detection based on dual prediction model of appearance and motion features

LI Ziqiang1, WANG Zhengyong1, CHEN Honggang1, LI Linyi2, HE Xiaohai1   

  1. 1. College of Electronics and Information Engineering, Sichuan University, Chengdu Sichuan 610065, China;
    2. The Second Research Institute of Civil Aviation Administration of China, Chengdu Sichuan 610041, China
  • Received:2020-12-08 Revised:2021-04-26 Online:2021-10-10 Published:2021-07-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61871278), the Sichuan Science and Technology Program (2019YFH0034).

摘要: 为了在视频异常行为检测中更加充分地运用外观和动作信息,设计出了一种能同时捕捉外观和动作信息的孪生网络模型。该网络的两个分支采用相同的自编码器结构,其中的外观子网络以连续几帧RGB图作为输入来预测下一帧,而动作子网络则输入RGB帧差图来预测未来帧差图。此外,考虑到影响基于预测的方法的检测效果的原因之一,即正常样本的多样性以及自编码器网络强大的“生成”能力,即对部分异常样本也有很好的预测效果,因此在编码器与解码器之间加入一个学习并存储正常样本的“原型”特征的记忆增强模块,从而使异常样本能获得更大的预测误差。在Avenue、UCSD-ped2和ShanghaiTech三个公共的异常数据集上进行了广泛的实验。实验结果表明,相较于其他基于重建或预测的视频异常行为检测方法,所提方法取得了更优异的表现。具体来说,该方法在Avenue、UCSD-ped2和ShanghaiTech数据集上的平均曲线下面积(AUC)分别达到了88.2%、97.5%和73.0%。

关键词: 异常行为检测, 视频监控, 自编码器, 记忆增强, 孪生网络

Abstract: In order to make full use of appearance and motion information in video abnormal behavior detection, a Siamese network model that can capture appearance and motion information at the same time was proposed. The two branches of the network were composed of the same autoencoder structure. Several consecutive frames of RGB images were used as the input of the appearance sub-network to predict the next frame, while RGB frame difference image was used as the input of the motion sub-network to predict the future frame difference. In addition, considering one of the reasons that affected the detection effect of the prediction-based method, that is the diversity of normal samples, and the powerful "generation" ability of the autoencoder network, that is it has a good prediction effect on some abnormal samples. Therefore, a memory enhancement module that learns and stores the "prototype" features of normal samples was added between the encoder and the decoder, so that the abnormal samples were able to obtain greater prediction error. Extensive experiments were conducted on three public anomaly detection datasets Avenue, UCSD-ped2 and ShanghaiTech. Experimental results show that, compared with other video abnormal behavior detection methods based on reconstruction or prediction, the proposed method achieves better performance. Specifically, the average Area Under Curve (AUC) of the proposed method on Avenue, UCSD-ped2 and ShanghaiTech datasets reach 88.2%, 97.5% and 73.0% respectively.

Key words: abnormal behavior detection, video surveillance, autoencoder, memory enhancement, Siamese network

中图分类号: