《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (7): 2043-2051.DOI: 10.11772/j.issn.1001-9081.2021050799

• 人工智能 • 上一篇    下一篇

基于深度神经网络的视频播放速度识别

陈荣源1, 姚剑敏1,2, 严群1,2(), 林志贤1   

  1. 1.福州大学 物理与信息工程学院,福州 350108
    2.晋江市博感电子科技有限公司,福建 晋江 362201
  • 收稿日期:2021-05-17 修回日期:2021-10-14 接受日期:2021-10-18 发布日期:2021-10-14 出版日期:2022-07-10
  • 通讯作者: 严群
  • 作者简介:陈荣源(1994—),男,福建三明人,硕士研究生,主要研究方向:深度学习、视频语义理解
    姚剑敏(1978—),男,福建莆田人,副研究员,博士,主要研究方向:人工智能、图像处理、信息显示
    林志贤(1975—),男,福建泉州人,教授,博士,主要研究方向:信息显示、平板显示驱动系统、图像处理。
  • 基金资助:
    国家重点研发计划项目(2016YFB0401503);广东省科技重大专项(2016B090906001);福建省科技重大专项(2014HZ0003?1);广东省光信息材料与技术重点实验室开放基金资助项目(2017B030301007)

Video playback speed recognition based on deep neural network

Rongyuan CHEN1, Jianmin YAO1,2, Qun YAN1,2(), Zhixian LIN1   

  1. 1.College of Physics and Information Engineering,Fuzhou University,Fuzhou Fujian 350108,China
    2.Jinjiang RichSense Electronic Technology Company Limited,Jinjiang Fujian 362201,China
  • Received:2021-05-17 Revised:2021-10-14 Accepted:2021-10-18 Online:2021-10-14 Published:2022-07-10
  • Contact: Qun YAN
  • About author:CHEN Rongyuan, born in 1994, M. S. candidate. His research interests include deep learning, video semantic understanding.
    YAO Jianmin, born in 1978, Ph. D., associate research fellow. His research interests include artificial intelligence, image processing, information display.
    LIN Zhixian, born in 1975, Ph. D., professor. His research interests include information display, flat panel display drive system, image processing.
  • Supported by:
    National Key Research and Development Program of China(2016YFB0401503);Science and Technology Major Program of Guangdong Province(2016B090906001);Science and Technology Major Program of Fujian Province(2014HZ0003-1);Open Fund of Guangdong Provincial Key Laboratory of Optical Information Materials and Technology(2017B030301007)

摘要:

针对目前的视频播放速度识别算法大多存在的提取精度差、模型参数量巨大的问题,提出了一种双支轻量化视频播放速度识别网络。首先,该网络是基于SlowFast双支网络架构组建的一个三维(3D)卷积网络;其次,为了弥补S3D-G网络在视频播放速度识别任务中存在的参数量大、浮点运算数多的缺陷,进行了轻量化的网络结构调整;最后,在网络结构中引入了高效通道注意力(ECA)模块,以通过通道注意力模块生成重点关注的内容对应的通道范围,这有助于提高视频特征提取的准确性。在Kinetics-400数据集上将所提网络与S3D-G、SlowFast网络进行对比实验。实验结果表明,所提网络在精确度差不多的情况下,模型大小和模型参数均比SlowFast减少了大约96%,浮点运算数减少到5.36 GFLOPs,显著提高了运行速度。

关键词: 深度神经网络, 视频播放速度识别, 双支网络, 通道注意力, 轻量化模型

Abstract:

Most of the current video playback speed recognition algorithms have poor extraction accuracy and many model parameters. Aiming at these problems, a dual-branch lightweight video playback speed recognition network was proposed. First, this network was a Three Dimensional (3D) convolutional network constructed on the basis of the SlowFast dual-branch network architecture. Secondly, in order to deal with the large number of parameters and many floating-point operations of S3D-G (Separable 3D convolutions network with Gating mechanism) network in video playback speed recognition tasks, a lightweight network structure adjustment was carried out. Finally, the Efficient Channel Attention (ECA) module was introduced in the network structure to generate the channel range corresponding to the focused content through the channel attention module, which helped to improve the accuracy of video feature extraction. In experiments, the proposed network was compared with S3D-G, SlowFast networks on the Kinetics-400 dataset. Experimental results show that with similar accuracy, the proposed network reduces both model size and model parameters by about 96% compared to SlowFast network, and the number of floating-point operations of the network is reduced to 5.36 GFLOPs, which means the running speed is increased significantly.

Key words: deep neural network, video playback speed recognition, dual-branch network, channel attention, lightweight model

中图分类号: