Video frame prediction based on deep convolutional long short-term memory neural network

doi:10.11772/j.issn.1001-9081.2018122551

Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (6): 1657-1662.DOI: 10.11772/j.issn.1001-9081.2018122551

• Artificial intelligence • Previous Articles Next Articles

Video frame prediction based on deep convolutional long short-term memory neural network

ZHANG Dezheng, WENG Liguo, XIA Min, CAO Hui

Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology(Nanjing University of Information Science & Technology), Nanjing Jiangsu 210044, China

Received:2018-12-26 Revised:2019-03-17 Online:2019-06-17 Published:2019-06-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61503192, 61773219), the Natural Science Foundation of Jiangsu Province (BK20161533), the Qing Lan Project of Jiangsu Province.

基于深度卷积长短时神经网络的视频帧预测

张德正, 翁理国, 夏旻, 曹辉

江苏省大气环境与装备技术协同创新中心(南京信息工程大学), 南京 210044

通讯作者: 夏旻
作者简介:张德正(1995-),男,江苏泗阳人,硕士研究生,主要研究方向:机器学习、大数据分析;翁理国(1981-),男,江苏南京人,副教授,博士,主要研究方向:机器学习、大数据分析;夏旻(1983-),男,江苏东台人,副教授,博士,主要研究方向:机器学习、大数据分析;曹辉(1993-),男,江苏淮安人,硕士研究生,主要研究方向:机器学习、大数据分析。
基金资助:
国家自然科学基金资助项目（61503192，61773219）；江苏省自然科学基金资助项目（BK20161533）；江苏省青蓝工程。

Abstract

Abstract: Concerning the difficulty in accurately predicting the spatial structure information details in video frame prediction, a method of deep convolutional Long Short Term Memory (LSTM) neural network was proposed by the improvement of the convolutional LSTM neural network. Firstly, the input sequence images were input into the coding network composed of two deep convolutional LSTM of different channels, and the position information change features and the spatial structure information change features of the input sequence images were learned by the coding network. Then, the learned change features were input into the decoding network corresponding to the coding network channel, and the next predicted picture was output by the decoding network. Finally, the picture was input back to the decoding network, and the next picture was predicted, and all the predicted pictures were output after the pre-set loop times. In the experiments on Moving-MNIST dataset, compared with the convolutional LSTM neural network, the proposed method preserved the accuracy of position information prediction, and had stronger spatial structure information detail representation ability with the same training steps. With the convolutional layer of the convolutional Gated Recurrent Unit (GRU) deepened, the method improved the details of the spatial structure information, verifying the versatility of the idea of the proposed method.

Key words: video frame prediction, Convolutional Neural Network (CNN), Long and Short-Term Memory (LSTM) neural network, encoding prediction, convolutional Gated Recurrent Unit (GRU)

摘要： 针对视频帧预测中难以准确预测空间结构信息细节的问题，通过对卷积长短时记忆（LSTM）神经网络的改进，提出了一种深度卷积长短时神经网络的方法。首先，将输入序列图像输入到两个不同通道的深度卷积LSTM网络组成的编码网络中，由编码网络学习输入序列图像的位置信息变化特征和空间结构信息变化特征；然后，将学习到的变化特征输入到与编码网络通道数对应的解码网络中，由解码网络输出预测的下一张图；最后，将这张图输入回解码网络中，预测接下来的一张图，循环预先设定的次后输出全部的预测图。与卷积LSTM神经网络相比，在Moving-MNIST数据集上的实验中，相同训练步数下所提方法不仅保留了位置信息预测准确的特点，而且空间结构信息细节表征能力更强。同时，将卷积门控循环单元（GRU）神经网络的卷积层加深后，该方法在空间结构信息细节表征上也取得了提升，检验了该方法思想的通用性。

关键词: 视频帧预测, 卷积神经网络, 长短时记忆神经网络, 编码预测, 卷积门控循环单元

CLC Number:

TP183

ZHANG Dezheng, WENG Liguo, XIA Min, CAO Hui. Video frame prediction based on deep convolutional long short-term memory neural network[J]. Journal of Computer Applications, 2019, 39(6): 1657-1662.

张德正, 翁理国, 夏旻, 曹辉. 基于深度卷积长短时神经网络的视频帧预测[J]. 计算机应用, 2019, 39(6): 1657-1662.

References

[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. North Miami Beach, FL:Curran Associates Inc., 2012:1097-1105.
[2] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2018-10-15]. https://arxiv.org/pdf/1409.1556.pdf.
[3] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2015:1-9.
[4] RYOO M S. Human activity prediction:early recognition of ongoing activities from streaming videos[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2011:1036-1043.
[5] ZHU S, JIA Y, PEI M. Parsing video events with goal inference and intent prediction[C]//Proceedings of the 2011 International Conference on Computer Vision. Piscataway, NJ:IEEE, 2011:487-494.
[6] VONDRICK C, PIRSIAVASH H, TORRALBA A. Anticipating visual representations from unlabeled video[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2016:98-106.
[7] KOOIJ J F P, SCHNEIDER N, FLOHR F,et al. Context-based pedestrian path prediction[C]//Proceedings of the 2014 European Conference on Computer Vision, LNCS 8694. Berlin:Springer, 2014:618-633.
[8] WALKER J, GUPTA A, HEBERT M. Dense optical flow prediction from a static image[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2015:2443-2451.
[9] MOTTAGHI R, RASTEGARI M, GUPTA A, et al. "What happens if…" learning to predict the effect of forces in images[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9908. Berlin:Springer, 2016:269-285.
[10] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[11] ELMAN J L. Distributed representations, simple recurrent net-works, and grammatical structure[J]. Machine Learning, 1991, 7(2/3):195-225.
[12] 李洋,董红斌.基于CNN和BiLSTM网络特征融合的文本情感分析[J].计算机应用,2018,38(11):3075-3080.(LI Y, DONG H B. Text sentiment analysis based on feature fusion of convolution neural network and bidirectional long short-term memory network[J]. Journal of Computer Applications, 2018, 38(11):3075-3080.)
[13] 姚煜,RYAD C.基于双向长短时记忆联结时序分类和加权有限状态转换器的端到端中文语音识别系统[J].计算机应用,2018,38(9):2495-2499.(YAO W, RYAD C. End-to-end Chinese speech recognition system based on bidirectional long-term memory-timed timing classification and weighted finite state converter[J]. Journal of Computer Applications, 2018, 38(9):2495-2499.)
[14] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 2014 Neural Information Processing Systems Conference. Cambridge, MA:MIT Press, 2014:3104-3112.
[15] BENGIO Y, SIMARD P, FRASCONI P. Learning long-term dependencies with gradient descent is difficult[J]. IEEE Transactions on Neural Networks, 1994, 5(2):157-166.
[16] SHI X J, CHEN Z R, WANG H, et al. Convolutional LSTM network:a machine learning approach for precipitation nowcasting[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2015:802-810.
[17] MOLLAHOSSEINI A, CHAN D, MAHOOR M H. Going deeper in facial expression recognition using deep neural networks[C]//Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision. Piscataway, NJ:IEEE, 2016:1-10.
[18] IOFFE S, SZEGEDY C. Batch normalization:accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Cambridge, MA:MIT Press, 2015:448-486.
[19] LESHNO M, LIN V Y, PINKUS A, et al. Original contribution:multilayer feedforward networks with a nonpolynomial activation function can approximate any function[J]. Neural Networks, 1991, 6(6):861-867.

Video frame prediction based on deep convolutional long short-term memory neural network

基于深度卷积长短时神经网络的视频帧预测

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[2]	Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499.
[3]	Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization [J]. Journal of Computer Applications, 2024, 44(7): 1987-1994.
[4]	Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242.
[5]	Mengyuan HUANG, Kan CHANG, Mingyang LING, Xinjie WEI, Tuanfa QIN. Progressive enhancement algorithm for low-light images based on layer guidance [J]. Journal of Computer Applications, 2024, 44(6): 1911-1919.
[6]	Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI. Multi-relation approximate reasoning model based on uncertain knowledge graph embedding [J]. Journal of Computer Applications, 2024, 44(6): 1751-1759.
[7]	Wenshuo GAO, Xiaoyun CHEN. Point cloud classification network based on node structure [J]. Journal of Computer Applications, 2024, 44(5): 1471-1478.
[8]	Min SUN, Qian CHENG, Xining DING. CBAM-CGRU-SVM based malware detection method for Android [J]. Journal of Computer Applications, 2024, 44(5): 1539-1545.
[9]	Jie WANG, Hua MENG. Image classification algorithm based on overall topological structure of point cloud [J]. Journal of Computer Applications, 2024, 44(4): 1107-1113.
[10]	Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(4): 1114-1120.
[11]	Lijun XU, Hui LI, Zuyang LIU, Kansong CHEN, Weixuan MA. 3D-GA-Unet： MRI image segmentation algorithm for glioma based on 3D-Ghost CNN [J]. Journal of Computer Applications, 2024, 44(4): 1294-1302.
[12]	Ruifeng HOU, Pengcheng ZHANG, Liyuan ZHANG, Zhiguo GUI, Yi LIU, Haowen ZHANG, Shubin WANG. Iterative denoising network based on total variation regular term expansion [J]. Journal of Computer Applications, 2024, 44(3): 916-921.
[13]	Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708.
[14]	Jingxian ZHOU, Xina LI. UAV detection and recognition based on improved convolutional neural network and radio frequency fingerprint [J]. Journal of Computer Applications, 2024, 44(3): 876-882.
[15]	Rui ZHANG, Siqi SONG, Jing HU, Yongmei ZHANG, Yanfeng CHAI. Performance evaluation of industry-university-research based on statistics and adaptive ParNet [J]. Journal of Computer Applications, 2024, 44(2): 628-637.