Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (9): 2903-2910.DOI: 10.11772/j.issn.1001-9081.2023091242

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Uncertainty-based frame associated short video event detection method

Yun LI1, Fuyou WANG2(), Peiguang JING3, Su WANG4, Ao XIAO5   

  1. 1.School of Big Data and Artificial Intelligence,Guangxi University of Finance and Economics,Nanning Guangxi 530003,China
    2.Institute of Electrification and Telecommunications,China Railway Design Corporation,Tianjin 300308,China
    3.School of Electrical Automation and Information Engineering,Tianjin University,Tianjin 300072,China
    4.College of Electronic Information,Guangxi Minzu University,Nanning Guangxi 530006,China
    5.School of Computer and Electronic Information,Guangxi University,Nanning Guangxi 530004,China
  • Received:2023-09-18 Revised:2023-12-11 Accepted:2023-12-12 Online:2024-03-15 Published:2024-09-10
  • Contact: Fuyou WANG
  • About author:LI Yun, born in 1978, Ph. D., professor. Her research interests include big data, artificial intelligence.
    JING Peiguang, born in 1988, Ph. D., associate professor. His research interests include multimedia computing, underwater image processing.
    WANG Su, born in 1998, M. S. candidate. His research interests include multimodal fusion.
    XIAO Ao, born in 1999, M. S. candidate. His research interests include multimodal fusion.
  • Supported by:
    National Natural Science Foundation of China(61861014);Doctoral Start-up Fund(BS2021025)

基于不确定度感知的帧关联短视频事件检测方法

李云1, 王富铕2(), 井佩光3, 王粟4, 肖澳5   

  1. 1.广西财经学院 大数据与人工智能学院, 南宁 530003
    2.中国铁路设计集团有限公司 电化电信院, 天津 300308
    3.天津大学 电气与信息工程学院, 天津 300072
    4.广西民族大学 电子信息学院, 南宁 530006
    5.广西大学 计算机与电子信息学院, 南宁 530004
  • 通讯作者: 王富铕
  • 作者简介:李云(1978—),女(壮族),广西南宁人,教授,博士,CCF会员,主要研究方向:大数据、人工智能
    井佩光(1988—),男,天津人,副教授,博士,主要研究方向:多媒体计算、水下图像处理
    王粟(1998—),男,江苏扬州人,硕士研究生,主要研究方向:多模态融合
    肖澳(1999—),男,湖南衡阳人,硕士研究生,主要研究方向:多模态融合。
  • 基金资助:
    国家自然科学基金资助项目(61861014);博士启动基金资助项目(BS2021025)

Abstract:

Aiming at the problem of how to combine the frame uncertainty and temporal correlation of short videos to enhance event detection capability, a frame associated short video event detection method based on uncertainty perception was proposed. Firstly, 2D Convolutional Neural Network (CNN) was used to extract the features of each frame of short video, and then the extracted features were forward propagated several times to obtain the feature mean value and the uncertainty information corresponding to the features through Bayesian variational layering. Secondly, the uncertainty perception module constructed by the model was used to fuse the feature mean value and the uncertainty information, and then the correlations in time domain of the fused features of the frames were strengthened by the temporal correlation module. Finally, the time-domain correlated features were used to realize short video event detection through the classification network. The short video event detection dataset crawled from Flickr platform was utilized to carry out experimental comparison, and the results show that subspace learning methods such as Support Vector Machine (SVM) have the poor classification performance and do not explore high-level semantic representations enough, while deep learning methods have significantly better accuracy for event detection. Compared to Sparse Video-Text Transformer (SViTT) method, the proposed method has the accuracy, Average Recall (AR), and Average Precision (AP) improved by 3.37%, 2.55%, and 2.09%, respectively, so that the effectiveness of the proposed method for the task of short video event detection is verified.

Key words: temporal correlation, frame associated short video event, Convolutional Neural Network (CNN), Bayesian neural network, uncertainty

摘要:

针对如何联合短视频的帧不确定度和时序关联性,以增强事件检测能力的问题,提出一种基于不确定度感知的帧关联短视频事件检测方法。首先,利用2D卷积神经网络(CNN)提取短视频每一帧的特征,再将该特征多次前向传播并通过贝叶斯变分层获得特征均值和与特征对应的不确定度信息;其次,利用模型构建的不确定度感知模块将特征均值和不确定度信息进行融合,再将融合后所得的各帧特征通过时序关联模块加强时域上的联系;最后,用时域关联后的特征通过分类网络实现短视频事件检测。在从Flickr平台上爬取到的短视频事件检测数据集上开展实验对比,实验结果表明,支持向量机(SVM)等子空间学习方法的分类性能较差,对高级语义表示的探索不充分;而深度学习方法对于事件检测的准确率明显更优。相较于SViTT(Sparse Video-Text Transformer)方法,所提方法的准确率、平均召回率和平均精度分别提高了3.37%、2.55%和2.09%,验证了所提方法在短视频事件检测任务上的有效性。

关键词: 时序关联性, 帧关联短视频事件, 卷积神经网络, 贝叶斯神经网络, 不确定度

CLC Number: