结合内卷与卷积算子的视频预测模型

doi:10.11772/j.issn.1001-9081.2023060853

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (1): 113-122.DOI: 10.11772/j.issn.1001-9081.2023060853

• 人工智能 • 上一篇

结合内卷与卷积算子的视频预测模型

朱俊宏¹, 赖俊宇¹^,²(), 甘炼强¹, 陈智勇¹, 刘华烁¹, 徐国尧¹

^1.电子科技大学航空航天学院, 成都 611731
^2.飞行器集群智能感知与协同控制四川省重点实验室(电子科技大学), 成都 611731

收稿日期:2023-06-30 修回日期:2023-10-10 接受日期:2023-10-13 发布日期:2024-01-24 出版日期:2024-01-10
通讯作者: 赖俊宇
作者简介:朱俊宏（1998—），男，四川德阳人，硕士研究生，主要研究方向：计算机视觉、视频预测；
甘炼强（2000—），男，四川广安人，硕士研究生，主要研究方向：计算机视觉、视频预测；
陈智勇（1997—），男，湖南岳阳人，硕士研究生，主要研究方向：机器学习、深度学习；
刘华烁（2000—），男，河北衡水人，硕士研究生，主要研究方向：深度学习、强化学习；
徐国尧（2000—），男，湖北黄冈人，硕士研究生，主要研究方向：深度学习、强化学习。
第一联系人：赖俊宇（1981—），男，四川德阳人，副教授，博士，主要研究方向：计算机视觉、计算机网络；
基金资助:
四川省重点研发计划项目(2022YFS0546)

Video prediction model combining involution and convolution operators

Junhong ZHU¹, Junyu LAI¹^,²(), Lianqiang GAN¹, Zhiyong CHEN¹, Huashuo LIU¹, Guoyao XU¹

^1.School of Aeronautics and Astronautics，University of Electronic Science and Technology of China，Chengdu Sichuan 611731，China
^2.Aircraft Swarm Intelligent Sensing and Cooperative Control Key Laboratory of Sichuan Province （University of Electronic Science and Technology of China），Chengdu Sichuan 611731，China

Received:2023-06-30 Revised:2023-10-10 Accepted:2023-10-13 Online:2024-01-24 Published:2024-01-10
Contact: Junyu LAI
About author:ZHU Junhong， born in 1998， M. S. candidate. His research interests include computer vision， video prediction.
GAN Lianqiang， born in 2000， M. S. candidate. His research interests include computer vision， video prediction.
CHEN Zhiyong， born in 1997， M. S. candidate. His research interests include machine learning， deep learning.
LIU Huashuo， born in 2000， M. S. candidate. His research interests include deep learning， reinforcement learning.
XU Guoyao， born in 2000， M. S. candidate. His research interests include deep learning， reinforcement learning.
Supported by:
Key Research and Development Project of Sichuan Province(2022YFS0546)

摘要/Abstract

摘要：

针对基于传统深度学习的视频预测中对数据空间特征提取效果不佳及预测精度低的问题，提出一种结合内卷与卷积算子（CICO）的视频预测模型。该模型主要通过以下三个方面提高视频序列的预测性能：首先，采用不同大小的卷积核增强对数据多粒度空间特征的提取能力，较大的卷积核能够提取更大空间范围的特征，而较小的卷积核可更精确地捕获视频目标的运动细节，实现对目标多角度表征学习；其次，用计算效率更高、参数更少的内卷算子替代核较大的卷积算子，内卷通过高效的通道间交互避免了大量的不必要参数，在降低计算和存储成本的同时提升模型预测能力；最后，引入核为1×1的卷积进行线性映射，增强不同特征之间的联合表达，提高了模型参数的利用效率并增强了预测的鲁棒性。通过多个数据集对该模型进行全面测试，结果表明，相较于目前最优的SimVP（Simpler yet better Video Prediction）模型，所提模型在多项指标上均有显著提升。在移动手写数据集上，均方误差和平均绝对误差分别降低25.2%和17.4%；在北京交通数据集上，均方误差降低1.2%；在人体行为数据集上，结构相似性指数和峰值信噪比分别提高0.66%和0.47%。可见，所提模型在提升视频预测精度方面十分有效。

关键词: 深度学习, 视频预测, 内卷, 卷积, 时空信息

Abstract:

To address the inadequate feature extraction from data space and low prediction accuracy in traditional deep learning based video prediction， a video prediction model Combining Involution and Convolution Operators （CICO） was proposed. The model enhanced video prediction performance through three aspects. Firstly， convolutions with varying kernel sizes were adopted to enhance extraction ability of multi-granularity spatial features and enable multi-angle representational learning of targets. In particular， larger kernels were applied to extract features from broader spatial ranges， while smaller kernels were employed to capture motion details more precisely. Secondly， large-kernel convolutions were replaced by the computationally efficient involution operators with fewer parameters in order to achieve efficient inter-channel interaction， avoid redundant parameters， decrease computational and storage costs. The predictive capacity of the model was enhanced at the same time. Finally， convolutions with kernel size 1×1 were introduced for linear mapping to strengthen joint expression between distinct features， improve parameter utilization efficiency， and strengthen prediction robustness. The proposed model’s superiority was validated through comprehensive experiments on various datasets， resulting in significant improvements over the state-of-the-art SimVP （Simpler yet Better Video Prediction） model. On Moving MNIST dataset， the Mean Squared Error （MSE） and Mean Absolute Error （MAE） were reduced by 25.2% and 17.4%， respectively. On Traffic Beijing dataset， the MSE was reduced by 1.2%. On KTH dataset， the Structure Similarity Index Measure （SSIM） and Peak Signal-to-Noise Ratio （PSNR） were improved by 0.66% and 0.47%， respectively. It can be seen that the proposed model is very effective in improving accuracy of video prediction.

Key words: deep learning, video prediction, involution, convolution, spatiotemporal information

中图分类号:

TP181

朱俊宏, 赖俊宇, 甘炼强, 陈智勇, 刘华烁, 徐国尧. 结合内卷与卷积算子的视频预测模型[J]. 计算机应用, 2024, 44(1): 113-122.

Junhong ZHU, Junyu LAI, Lianqiang GAN, Zhiyong CHEN, Huashuo LIU, Guoyao XU. Video prediction model combining involution and convolution operators[J]. Journal of Computer Applications, 2024, 44(1): 113-122.

图/表 17

图1 CICO-VP模型架构

Fig. 1 Architecture of CICO-VP model

图2 编码器结构

Fig. 2 Structure of encoder

图3 转换器结构

Fig. 3 Structure of convertor

图4 解码器结构

Fig. 4 Structure of decoder

表1 不同数据集的实验参数配置

Tab. 1 Experiment parameter settings for different datasets

数据集	训练样本数	测试样本数	图像规格	输入帧数	输出帧数
移动手写	10 000	10 000	（1， 64， 64）	10	10
北京交通	19 627	1 334	（2， 32， 32）	4	4
人体行为	5 200	3 167	（1， 128， 128）	10	20

表2 不同数据集的模型编解码器超参数值

Tab. 2 Hyper-parameter values of encoder and decoder for different datasets

数据集	N_e和N_d	H_e和H_d	卷积核大小
移动手写	4	64	3×3和5×5
北京交通	3	64	3×3和5×5
人体行为	3	32	3×3和5×5

表3 不同数据集的转换器超参数值

Tab. 3 Hyper-parameter values of convertor for different datasets

数据集	ConvInvo模块个数N_c	转换器隐藏层数量H_c	卷积算子核大小	内卷算子核大小
移动手写	4	512	3×3	11×11
北京交通	3	128	3×3	11×11
人体行为	4	256	3×3	11×11

图5 CICO-VP在移动手写数据集的训练过程

Fig. 5 Training process of CICO-VP on Moving MNIST dataset

表4 各模型在移动手写数据集性能比较

Tab. 4 Performance comparison of different models on Moving MNIST dataset

模型	MSE↓	MAE↓	SSIM↑
ConvLSTM^［1］	103.3	182.9	0.707
MIM^［8］	44.2	101.1	0.910
PredRNN^［30］	56.8	126.1	0.867
CausalLSTM^［32］	46.5	106.8	0.898
E3D-LSTM^［34］	41.3	86.4	0.910
SimVP^［35］	23.8	68.9	0.948
PhyDNet^［42］	24.4	70.3	0.947
CICO-VP	17.8	56.9	0.961

图6 移动手写数据集实验结果

Fig. 6 Experimental results on Moving MNIST dataset

图7 CICO-VP在北京交通数据集的训练过程

Fig. 7 Training process of CICO-VP on Traffic Beijing dataset

表5 各模型在北京交通数据集性能比较

Tab. 5 Performance comparison of different models on Traffic Beijing dataset

模型	MSE×100↓	MAE↓	SSIM↑
ConvLSTM^［1］	48.5	17.7	0.978
MIM^［8］	42.9	16.6	0.971
PredRNN^［30］	46.4	17.1	0.971
CausalLSTM^［32］	44.8	16.9	0.977
E3D-LSTM^［34］	43.2	16.9	0.979
SimVP^［35］	41.4	16.2	0.982
PhyDNet^［42］	41.9	16.2	0.982
CICO-VP	40.9	16.2	0.982

图8 北京交通数据集实验结果

Fig. 8 Experimental results on Traffic Beijing dataset

图9 CICO-VP在人体行为数据集的训练过程

Fig. 9 Training process of CICO-VP on KTH dataset

表6 人体行为数据集实验结果

Tab. 6 Experiment results of KTH dataset

模型	SSIM↑	PSNR/dB↑	模型	SSIM↑	PSNR/dB↑
ConvLSTM^［1］	0.712	23.58	SVAP-VAE^［43］	0.852	27.77
SV2P^［5］	0.838	27.79	VPN^［44］	0.746	23.76
PredRNN^［30］	0.839	27.55	DFN^［45］	0.794	27.26
PredRNN++^［32］	0.865	28.47	fRNN^［46］	0.771	26.12
E3d-LSTM^［34］	0.879	29.31	Znet^［47］	0.817	27.50
SimVP^［35］	0.905	33.72	VarNet^［48］	0.843	28.48
MCnet^［40］	0.804	25.95	STMFANet^［49］	0.893	29.85
SAVP^［43］	0.746	25.38	CICO-VP	0.911	33.88

图10 人体行为数据集实验结果

Fig. 10 Experimental results on KTH dataset

图11 移动手写数据集消融实验结果

Fig. 11 Ablation experiment results on Moving MNIST dataset

参考文献 49

1	CHANG Z， ZHANG X， WANG S， et al. STRPM： A spatiotemporal residual predictive model for high-resolution video prediction ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 13926-13935. 10.1109/cvpr52688.2022.01356
2	WU H， YAO Z， WANG J， et al. MotionRNN： A flexible model for video prediction with spacetime-varying motions ［C］// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 15435-15444. 10.1109/cvpr46437.2021.01518
3	LIU B， CHEN Y， LIU S， et al. Deep learning in latent space for video prediction and compression ［C］// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 701-710. 10.1109/cvpr46437.2021.00076
4	SHI X， CHEN Z， WANG H， et al. Convolutional LSTM network： A machine learning approach for precipitation nowcasting ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 802-810.
5	BABAEIZADEH M， FINN C， ERHAN D， et al. Stochastic variational video prediction ［EB/OL］. ［2023-01-05］. .
6	MARTÍNEZ-GONZÁLEZ A， VILLAMIZAR M， CANÉVET O， et al. Efficient convolutional neural networks for depth-based multi-person pose estimation ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2019， 30（11）： 4207-4221. 10.1109/tcsvt.2019.2952779
7	KONG Y， FU Y. Human action recognition and prediction： A survey ［J］. International Journal of Computer Vision， 2022， 130： 1366-1401. 10.1007/s11263-022-01594-9
8	WANG Y， ZHANG J， ZHU H， et al. Memory in memory： A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 9146-9154. 10.1109/cvpr.2019.00937
9	OPREA S， MARTINEZ-GONZALEZ P， GARCIA-GARCIA A， et al. A review on deep learning techniques for video prediction ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（6）： 2806-2826. 10.1109/tpami.2020.3045007
10	KINGMA D P， WELLING M. Auto-encoding variational Bayes ［C/OL］// Proceedings of the 2nd International Conference on Learning Representations. ［2023-01-05］. . 10.1561/2200000056
11	REZENDE D J， MOHAMED S， Stochastic WIERSTRA D.. backpropagation and approximate inference in deep generative models ［C］// Proceedings of the 31st International Conference on Machine Learning. New York： JMLR.org， 2014： 1278-1286.
12	KIPF T N， WELLING M. Variational graph auto-encoders ［EB/OL］. ［2023-01-05］. .
13	RADFORD A， METZ L， CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks ［C/OL］// Proceedings of the 2016 International Conference on Machine Learning. ［2023-01-05］. . 10.1109/aiar.2018.8769811
14	GULRAJANI I， AHMED F， ARJOVSKY M， et al. Improved training of Wasserstein GANs ［C］// Proceedings of the 2017 International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc， 2017： 5767-5777.
15	KARRAS T， AILA T， LAINE S， et al. Progressive growing of GANs for improved quality， stability， and variation ［C/OL］// Proceedings of the 2018 International Conference on Learning Representations. ［2023-01-05］. . 10.1109/cvpr42600.2020.00813
16	BROCK A， DONAHUE J， SIMONYAN K. Large scale GAN training for high fidelity natural image synthesis ［C/OL］// Proceedings of the 2019 International Conference on Learning Representations. ［2023-01-05］. .
17	KARRAS T， LAINE S， AILA T. A style-based generator architecture for generative adversarial networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4396-4405. 10.1109/cvpr.2019.00453
18	KARRAS T， AITTALA M， HELLSTEN J， et al. Training generative adversarial networks with limited data ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2020： 12104-12114.
19	KARRAS T， AITTALA M， LAINE S， et al. Alias-free generative adversarial networks ［C/OL］// Proceedings of the 35th International Conference on Neural Information Processing Systems. ［2023-01-05］. . 10.1007/978-3-030-93158-2_7
20	GU J T， LIU L J， WANG P， et al. StyleNeRF： A style-based 3d-aware generator for high-resolution image synthesis ［C/OL］// Proceedings of the 2022 International Conference on Learning Representations. ［2023-01-05］. .
21	CHAN E R， MONTEIRO M， KELLNHOFER P， et al. pi-GAN： Periodic implicit generative adversarial networks for 3D-aware image synthesis ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 5795-5805. 10.1109/cvpr46437.2021.00574
22	WALKER J， MARINO K， GUPTA A， et al. The pose knows： Video forecasting by generating pose futures ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3352-3361. 10.1109/iccv.2017.361
23	HU Q Y， WAELCHLI A， PORTENIER T， et al. Video synthesis from a single image and motion stroke ［EB/OL］. ［2023-01-07］. .
24	WU B， NAIR S， MARTÍN-MARTÍN R， et al. Greedy hierarchical variational autoencoders for large-scale video prediction ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 2318-2328. 10.1109/cvpr46437.2021.00235
25	WEN S， LIU W， YANG Y， et al. Generating realistic videos from keyframes with concatenated GANs ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2019， 29（8）： 2337-2348. 10.1109/tcsvt.2018.2867934
26	MATHIEU M， COUPRIE C， LeCUN Y. Deep multi-scale video prediction beyond mean square error ［C/OL］// Proceedings of the 2016 International Conference on Learning Representations. ［2023-01-05］. .
27	VONDRICK C， TORRALBA A. Generating the future with adversarial transformers ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2992-3000. 10.1109/cvpr.2017.319
28	HOCHREITER S， SCHMIDHUBER J. Long short-term memory ［J］. Neural Computation， 1997， 9（8）： 1735-1780. 10.1162/neco.1997.9.8.1735
29	FINN C， GOODFELLOW I， LEVINE S. Unsupervised learning for physical interaction through video prediction ［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc， 2016： 64-72.
30	WANG Y， LONG M， WANG J， et al. PredRNN： Recurrent neural networks for predictive learning using spatiotemporal LSTMs ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 879-888.
31	WANG Y， WU H， ZHANG J， et al. PredRNN： A recurrent neural network for spatiotemporal predictive learning ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（2）： 2208-2225. 10.1109/tpami.2022.3165153
32	WANG Y， GAO Z， LONG M， et al. PredRNN++： Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning ［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 5123-5132.
33	LOTTER W， KREIMAN G， COX D. Deep predictive coding networks for video Prediction and unsupervised learning ［C/OL］// Proceedings of the 2017 International Conference on Learning Representations. ［2023-01-05］. .
34	WANG Y， JIANG L， YANG M-H， et al. Eidetic 3D LSTM： A model for video prediction and beyond ［C/OL］// Proceedings of the 2018 International Conference on Learning Representations. ［2023-01-05］. .
35	GAO Z， TAN C， WU L， et al. SimVP： Simpler yet better video prediction ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 3160-3170. 10.1109/cvpr52688.2022.00317
36	朱俊宏，赖俊宇，刘华烁，等.一种基于多层卷积结构的视频帧预测方法： CN116567258A ［P］. 2023-08-08.
	ZHU J H， LAI J Y， LIU H S， et al. A video frame prediction method based on multi-layer convolution structure： CN116567258A ［P］. 2023-08-08.
37	SRIVASTAVA N， MANSIMOV E， SALAKHUTDINOV R. Unsupervised learning of video representations using LSTMs ［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 843-852. 10.1109/iccv.2015.320
38	ZHANG J， ZHENG Y， QI D. Deep spatio-temporal residual networks for citywide crowd flows prediction ［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017： 1655-1661. 10.1609/aaai.v31i1.10735
39	SCHULDT C， LAPTEV I， CAPUTO B. Recognizing human actions： a local SVM approach ［C］// Proceedings of the 17th International Conference on Pattern Recognition. Piscataway： IEEE， 2004： 32-36. 10.1109/icpr.2004.1334462
40	VILLEGAS R， YANG J， HONG S， et al. Decomposing motion and content for natural video sequence prediction ［C/OL］// Proceedings of the 2017 International Conference on Learning Representations. ［2023-01-05］. .
41	WANG Z， BOVIK A C， SHEIKH H R， et al. Image quality assessment： from error visibility to structural similarity ［J］. IEEE Transactions on Image Processing， 2004， 13（4）： 600-612. 10.1109/tip.2003.819861
42	LE GUEN V， THOME N. Disentangling physical dynamics from unknown factors for unsupervised video prediction ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11471-11481. 10.1109/cvpr42600.2020.01149
43	LEE A X， ZHANG R， EBERT F， et al. Stochastic adversarial video prediction ［EB/OL］. ［2023-01-07］. .
44	KALCHBRENNER N， VAN DEN OORD A， SIMONYAN K， et al. Video pixel networks ［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1771-1779.
45	DE BRABANDERE B， JIA X， TUYTELAARS T， et al. Dynamic filter networks ［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc， 2016： 667-675.
46	OLIU M， SELVA J， ESCALERA S， et al. Folded recurrent neural networks for future video prediction ［C］// Proceedings of the 2018 European Conference on Computer Vision. Cham： Springer， 2018： 745-761. 10.1007/978-3-030-01264-9_44
47	ZHANG J， WANG Y， LONG M， et al. Z-order recurrent neural networks for video prediction ［C］// Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2019： 230-235. 10.1109/icme.2019.00048
48	JIN B， HU Y， ZENG Y， et al. Exploring variations for unsupervised video prediction ［C］// Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway： IEEE， 2018： 5801-5806. 10.1109/iros.2018.8594264
49	JIN B， HU Y， TANG Q， et al. Exploring spatial-temporal multi-frequency analysis for high-fidelity and temporal-consistency video prediction ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 4553-4562. 10.1109/cvpr42600.2020.00461

[1]	尚绍法, 蒋林, 李远成, 朱筠. 异构平台下卷积神经网络推理模型自适应划分和调度方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2828-2835.
[2]	路琨婷, 费蓉蓉, 张选德. 融合卷积神经网络的遥感图像全色锐化[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2963-2969.
[3]	何子仪, 杨燕, 张熠玲. 深度融合多视图聚类网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2651-2656.
[4]	徐丽, 符祥远, 李浩然. 基于门控卷积的时空交通流预测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2760-2765.
[5]	张涵钰, 李振波, 李蔚然, 杨普. 基于机器视觉的水产养殖计数研究综述[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2970-2982.
[6]	陈俊韬, 朱子奇. 基于多尺度特征提取与融合的图像复制-粘贴伪造检测[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2919-2924.
[7]	李校林, 杨松佳. 基于深度学习的多用户毫米波中继网络混合波束赋形[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2511-2516.
[8]	姜钧舰, 刘达维, 刘逸凡, 任酉贵, 赵志滨. 基于孪生网络的小样本目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2325-2329.
[9]	刘欢, 吴亮红, 张侣, 陈亮, 周博文, 张红强. 基于特征双融合CenterNet的白细胞检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2602-2610.
[10]	王一, 谢杰, 程佳, 豆立伟. 基于深度学习的RGB图像目标位姿估计综述[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2546-2555.
[11]	梁美佳, 刘昕武, 胡晓鹏. 基于改进YOLOv3的列车运行环境图像小目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2611-2618.
[12]	郭祥, 姜文刚, 王宇航. 基于改进Inception-ResNet的加密流量分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2471-2476.
[13]	崔雨萌, 王靖亚, 刘晓文, 闫尚义, 陶知众. 融合注意力和裁剪机制的通用文本分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2396-2405.
[14]	樊海玮, 鲁芯丝雨, 张丽苗, 安毅生. 融合知识图谱和图注意力网络的引文推荐算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2420-2425.
[15]	张琨, 杨丰玉, 钟发, 曾广东, 周世健. 基于混合代码表示的源代码脆弱性检测[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2517-2526.

结合内卷与卷积算子的视频预测模型

Video prediction model combining involution and convolution operators

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 49

相关文章 15

编辑推荐

Metrics