Video summarization generation model based on improved bi-directional long short-term memory network

doi:10.11772/j.issn.1001-9081.2020091512

Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (7): 1908-1914.DOI: 10.11772/j.issn.1001-9081.2020091512

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Video summarization generation model based on improved bi-directional long short-term memory network

WU Guangli^1,2, LI Leiting¹, GUO Zhenzhou¹, WANG Chengxiang¹

1. School of Cyberspace Security, Gansu University of Political Science and Law, Lanzhou Gansu 730070, China;
2. Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education(Northwest Minzu University), Lanzhou Gansu 730030, China

Received:2020-09-28 Revised:2020-12-22 Online:2020-12-31 Published:2021-07-10
Supported by:
This work is partially supported by the Natural Science Fundation of Gansu Province (17JR5RA161), the Youth Science and Technology Program of Gansu Province (18JR3RA193), the Lanzhou Talent Innovation and Entrepreneurship Project (2020-RC-27), the Colleges and Universities of Gansu Province Innovation Ability Improvement Project (2020B-167), the Longyuan Youth Innovation and Entrepreneurship Talent Project (2021LQGR20).

基于改进的双向长短期记忆网络的视频摘要生成模型

武光利^1,2, 李雷霆¹, 郭振洲¹, 王成祥¹

1. 甘肃政法大学网络空间安全学院, 兰州 730070;
2. 中国民族语言文字信息技术教育部重点实验室(西北民族大学), 兰州 730030

通讯作者: 武光利
作者简介:武光利(1981-),男,山东潍坊人,教授,博士,CCF会员,主要研究方向:视频内容理解、人工智能;李雷霆(1996-),男,山东济宁人,硕士研究生,主要研究方向:视频内容理解、人工智能;郭振洲(1995-),男,河南濮阳人,硕士研究生,主要研究方向:视频内容理解、人工智能;王成祥(1995-),男,河南开封人,硕士研究生,主要研究方向:视频内容理解、人工智能。
基金资助:
甘肃省自然科学基金资助项目（17JR5RA161）；甘肃省青年科技基金计划项目（18JR3RA193）；兰州市人才创新创业项目（2020-RC-27）；甘肃省高等学校创新能力提升项目（2020B-167）；陇原青年创新创业人才项目（2021LQGR20）。

Abstract

Abstract: In order to solve the problems that traditional video summarization methods often do not consider temporal information and the extracted video features are too complex and prone to overfitting, a video summarization generation model based on improved Bi-directional Long Short-Term Memory (BiLSTM) network was proposed. Firstly, the deep features of the video frames were extracted by Convolutional Neural Network (CNN), and in order to make the generated video summarization more diverse, the BiLSTM was adopted to convert the deep feature recognition task into the sequence feature annotation task of the video frames, so that the model was able to obtain more context information. Secondly, considering that the generated video summarization should be representative, the fusion of max pooling was adopted to reduce the feature dimension and highlight the key information to weaken the redundant information, so that the model was able to learn the representative features, and the reduction of the feature dimension also reduced the parameters required in the fully connected layer to avoid the overfitting problem. Finally, the importance scores of the video frames were predicted and converted into the shot scores, which was used to select the key shots to generate video summarization. Experimental results show that the improved video summarization model improves the accuracy of video summarization generation on two standard datasets TvSum and SumMe, its F1-score values are improved by 1.4 and 0.3 percentage points respectively compared with the existing Long Short-Term Memory (LSTM) network based video summarization model DPPLSTM (Determinantal Point Process Long Short-Term Memory).

Key words: video summarization, Convolutional Neural Network (CNN), Bi-directional Long Short-Term Memory (BiLSTM) network, max pooling

摘要： 针对传统视频摘要方法往往没有考虑时序信息以及提取的视频特征过于复杂、易出现过拟合现象的问题，提出一种基于改进的双向长短期记忆（BiLSTM）网络的视频摘要生成模型。首先，通过卷积神经网络（CNN）提取视频帧的深度特征，而且为了使生成的视频摘要更具多样性，采用BiLSTM网络将深度特征识别任务转换为视频帧的时序特征标注任务，让模型获得更多上下文信息；其次，考虑到生成的视频摘要应当具有代表性，因此通过融合最大池化在降低特征维度的同时突出关键信息以淡化冗余信息，使模型能够学习具有代表性的特征，而特征维度的降低也减少了全连接层需要的参数，避免了过拟合问题；最后，预测视频帧的重要性分数并转换为镜头分数，以此选取关键镜头生成视频摘要。实验结果表明，在标准数据集TvSum和SumMe上，改进后的视频摘要生成模型能提升生成视频摘要的准确性；而且它的F1-score值也比基于长短期记忆（LSTM）网络的视频摘要模型DPPLSTM在两个数据集上分别提高1.4和0.3个百分点。

关键词: 视频摘要, 卷积神经网络, 双向长短期记忆网络, 最大池化

CLC Number:

TP391.4

WU Guangli, LI Leiting, GUO Zhenzhou, WANG Chengxiang. Video summarization generation model based on improved bi-directional long short-term memory network[J]. Journal of Computer Applications, 2021, 41(7): 1908-1914.

武光利, 李雷霆, 郭振洲, 王成祥. 基于改进的双向长短期记忆网络的视频摘要生成模型[J]. 计算机应用, 2021, 41(7): 1908-1914.

References

[1] WU G,LI L,GUO Z,et al. Video summarization based on ListNet scoring mechanism[C]//Proceedings of the 5th International Conference on Computer and Communication Systems. Piscataway:IEEE,2020:281-285.
[2] 朱映映, 周洞汝. 一种基于视频聚类的关键帧提取方法[J]. 计算机工程,2004,30(4):12-13,121.(ZHU Y Y,ZHOU D R. An approach of key frame extraction based on video clustering[J]. Computer Engineering,2004,30(4):12-13,121.)
[3] 刘华咏, 郝会芬, 李涛. 基于视频聚类的关键帧提取算法[J]. 物联网技术,2014,4(8):59-61.(LIU H Y,HAO H F,LI T. Key frame extraction algorithm based on video clustering[J]. Internet of Things Technologies,2014,4(8):59-61.)
[4] 李全栋, 陈树越, 张微. 一种改进的无监督聚类的关键帧提取算法[J]. 应用光学,2010,31(5):741-744.(LI Q D,CHEN S Y, ZHANG W. Improved algorithm for key frame extraction based on unsupervised clustering[J]. Journal of Applied Optics,2010,31(5):741-744.)
[5] 陆伟艳, 夏定元, 刘毅. 基于内容的视频检索的关键帧提取[J]. 微计算机信息,2007,23(33):298-300.(LU W Y,XIA D Y, LIU Y. An approach of key frame extraction based on mutual information[J]. Microcomputer Information, 2007, 23(33):298-300.)
[6] 王忠. 基于内容的视频检索关键技术的研究与实现[D]. 西安:西安电子科技大学,2009:30-33.(WANG Z. Research and implementation of key technology of content-based video retrieval[D]. Xi'an:Xidian University,2009:30-33.)
[7] 吴凌琳, 杨磊, 吴晓雨. 视频摘要系统的技术研究与实现[J]. 中国传媒大学学报(自然科学版),2013,20(1):44-51,30.(WU L L,YANG L,WU X Y. Research and implementation on the techniques of video[J]. Journal of Communication University of China(Science and Technology),2013,20(1):44-51,30.)
[8] ZHANG H J,LOW C Y,SMOLIAR S W,et al. Video parsing, retrieval and browsing:an integrated and content-based solution[C]//Proceedings of the 3rd ACM International Conference on Multimedia. New York:ACM,1995:15-24.
[9] 张静. 基于内容的视频镜头边界检测和关键帧提取[D]. 南昌:南昌航空大学,2019:24-27.(ZHANG J. Content-based video shot boundary detection and key frame extraction[D]. Nanchang:Nanchang Hangkong University,2019:24-27.)
[10] GONG Y,LIU X. Generating optimal videos summaries[C]//Proceeding of the 2000 IEEE International Conference on Multimedia and Expo. Piscataway:IEEE,2000:1559-1562.
[11] ZHANG K,CHAO W L,SHA F,et al. Video summarization with long short-term memory[C]//Proceeding of the 2016 European Conference on Computer Vision,LNCS 9911. Cham:Springer, 2016:766-782.
[12] JI Z,XIONG K,PANG Y,et al. Video summarization with attention-based encoder-decoder networks[J]. IEEE Transactions on Circuits and Systems for Video Technology,2020,30(6):1709-1717.
[13] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-09-02]. https://arxiv.org/pdf/1409.1556.pdf.
[14] POTAPOV D,DOUZE M,HARCHAOUI Z,et al. Categoryspecific video summarization[C]//Proceeding of the 2014 European Conference on Computer Vision,LNCS 8694. Cham:Springer,2014:540-555.
[15] FAJTL J,SOKEH H S,ARGYRIOU V,et al. Summarizing videos with attention[C]//Proceeding of the 2018 Asian Conference on Computer Vision,LNCS 11367. Cham:Springer, 2018:39-54.
[16] SONG Y, VALLMITJANA J, STENT A, et al. TVSum:summarizing web videos using titles[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:5179-5187.
[17] GYGLI M, GRABNER H, RIEMENSCHNEIDER H, et al. Creating summaries from user videos[C]//Proceeding of the 2014 European Conference on Computer Vision,LNCS 8695. Cham:Springer,2014:505-520.
[18] ZHOU K,QIAO Y,XIANG T. Deep reinforcement learning for unsupervised video summarization with diversityrepresentativeness reward[C]//Proceeding of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA:AAAI Press,2018:7582-7589.

Video summarization generation model based on improved bi-directional long short-term memory network

基于改进的双向长短期记忆网络的视频摘要生成模型

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[2]	Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499.
[3]	Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization [J]. Journal of Computer Applications, 2024, 44(7): 1987-1994.
[4]	Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242.
[5]	Mengyuan HUANG, Kan CHANG, Mingyang LING, Xinjie WEI, Tuanfa QIN. Progressive enhancement algorithm for low-light images based on layer guidance [J]. Journal of Computer Applications, 2024, 44(6): 1911-1919.
[6]	Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI. Multi-relation approximate reasoning model based on uncertain knowledge graph embedding [J]. Journal of Computer Applications, 2024, 44(6): 1751-1759.
[7]	Wenshuo GAO, Xiaoyun CHEN. Point cloud classification network based on node structure [J]. Journal of Computer Applications, 2024, 44(5): 1471-1478.
[8]	Min SUN, Qian CHENG, Xining DING. CBAM-CGRU-SVM based malware detection method for Android [J]. Journal of Computer Applications, 2024, 44(5): 1539-1545.
[9]	Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(4): 1114-1120.
[10]	Lijun XU, Hui LI, Zuyang LIU, Kansong CHEN, Weixuan MA. 3D-GA-Unet： MRI image segmentation algorithm for glioma based on 3D-Ghost CNN [J]. Journal of Computer Applications, 2024, 44(4): 1294-1302.
[11]	Jie WANG, Hua MENG. Image classification algorithm based on overall topological structure of point cloud [J]. Journal of Computer Applications, 2024, 44(4): 1107-1113.
[12]	Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708.
[13]	Ruifeng HOU, Pengcheng ZHANG, Liyuan ZHANG, Zhiguo GUI, Yi LIU, Haowen ZHANG, Shubin WANG. Iterative denoising network based on total variation regular term expansion [J]. Journal of Computer Applications, 2024, 44(3): 916-921.
[14]	Jingxian ZHOU, Xina LI. UAV detection and recognition based on improved convolutional neural network and radio frequency fingerprint [J]. Journal of Computer Applications, 2024, 44(3): 876-882.
[15]	Xinran LUO, Tianrui LI, Zhen JIA. Chinese medical named entity recognition based on self-attention mechanism and lexicon enhancement [J]. Journal of Computer Applications, 2024, 44(2): 385-392.