《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (11): 3564-3572.DOI: 10.11772/j.issn.1001-9081.2021122153

• 第二十一届中国虚拟现实大会 • 上一篇    

基于时间注意力机制和EfficientNet的视频暴力行为检测

蔡兴泉, 封丁惟, 王通, 孙辰, 孙海燕()   

  1. 北方工业大学 信息学院,北京 100144
  • 收稿日期:2021-12-21 修回日期:2022-01-21 接受日期:2022-01-26 发布日期:2022-03-02 出版日期:2022-11-10
  • 通讯作者: 孙海燕
  • 作者简介:蔡兴泉(1980—),男,山东济南人,教授,博士,CCF高级会员,主要研究方向:虚拟现实、人机互动、深度学习
    封丁惟(1997—),男,山东青岛人,硕士研究生,主要研究方向:虚拟现实、深度学习
    王通(1996—),男,山西大同人,硕士研究生,主要研究方向:虚拟现实、深度学习
    孙辰(1996—),男,山东临沂人,硕士,主要研究方向:虚拟现实、深度学习
    孙海燕(1980—),女,山东济宁人,讲师,博士,主要研究方向:虚拟现实、深度学习。sunhaiyan80@hotmail.com
  • 基金资助:
    北京市社会科学基金资助项目(19YTC043)

Violence detection in video based on temporal attention mechanism and EfficientNet

Xingquan CAI, Dingwei FENG, Tong WANG, Chen SUN, Haiyan SUN()   

  1. School of Information Science and Technology,North China University of Technology,Beijing 100144,China
  • Received:2021-12-21 Revised:2022-01-21 Accepted:2022-01-26 Online:2022-03-02 Published:2022-11-10
  • Contact: Haiyan SUN
  • About author:CAI Xingquan, born in 1980, Ph. D., professor. His research interests include virtual reality, human-computer interaction, deep learning.
    FENG Dingwei, born in 1997, M. S. candidate. His research interests include virtual reality, deep learning.
    WANG Tong, born in 1996, M. S. candidate. His research interests include virtual reality, deep learning.
    SUN Chen, born in 1996, M. S. His research interests include virtual reality, deep learning.
    SUN Haiyan, born in 1980, Ph. D., lecturer. Her research interests include virtual reality, deep learning.
  • Supported by:
    Beijing Social Science Foundation(19YTC043)

摘要:

针对一般的暴力行为检测方法模型参数量大、计算复杂度高、准确率较低等问题,提出一种基于时间注意力机制和EfficientNet的视频暴力行为检测方法。首先将通过对数据集进行预处理计算得到的前景图输入到网络模型中提取视频特征,同时利用轻量化EfficientNet提取前景图中的帧级空间暴力特征,并利用卷积长短时记忆网络(ConvLSTM)进一步提取视频序列的全局时空特征;接着,结合时间注意力机制,计算得到视频级特征表示;最后将视频级特征表示映射到分类空间,并利用Softmax分类器进行视频暴力行为分类并输出检测结果,实现视频的暴力行为检测。实验结果表明,该方法能够减少模型参数量,降低计算复杂度,在有限的资源下提高暴力行为检测准确率,提升模型的综合性能。

关键词: 暴力行为检测, 时间注意力机制, 卷积长短时记忆网络, EfficientNet模型

Abstract:

Aiming at the problems of large model parameters, high computational complexity and low accuracy of traditional violence detection methods, a method of violence detection in video based on temporal attention mechanism and EfficientNet was proposed. Firstly, the foreground image obtained by preprocessing the dataset was input to the network model to extract the video features, meanwhile, the frame-level spatial features of violence were extracted by using the lightweight EfficientNet, and the global spatial-temporal features of the video sequence were further extracted by using the Convolutional Long Short-Term Memory (ConvLSTM) network. Then, combined with temporal attention mechanism, the video-level feature representations were obtained. Finally, the video-level feature representations were mapped to the classification space, and the Softmax classifier was used to classify the video violence and output the detection results, realizing the violence detection of video. Experimental results show that the proposed method can decrease the number of model parameters, reduce the computational complexity, increase the accuracy of violence detection and improve the comprehensive performance of the model with limited resources.

Key words: violence detection, temporal attention mechanism, Convolutional Long Short-Term Memory (ConvLSTM) network, EfficientNet model

中图分类号: