基于帧间跨越光流的视频超分辨率重建网络

doi:10.11772/j.issn.1001-9081.2023040523

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (4): 1277-1284.DOI: 10.11772/j.issn.1001-9081.2023040523

所属专题：多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于帧间跨越光流的视频超分辨率重建网络

刘扬, 刘蓉(), 方可, 张心月, 王光旭

华中师范大学物理科学与技术学院，武汉 430079

收稿日期:2023-05-05 修回日期:2023-08-18 接受日期:2023-08-24 发布日期:2023-12-04 出版日期:2024-04-10
通讯作者: 刘蓉
作者简介:刘扬（1999—），男，湖南长沙人，硕士研究生，主要研究方向：计算机视觉
刘蓉（1969—），女，湖南安化人，副教授，博士，主要研究方向：智能信息处理、人工智能 liurong@ccnu.edu.cn
方可（1999—），男，河南周口人，硕士研究生，主要研究方向：计算机视觉
张心月（1998—），女，河南周口人，硕士研究生，主要研究方向：自然语言处理
王光旭（1999—），男，湖北襄阳人，硕士研究生，主要研究方向：自然语言处理。
基金资助:
国家社会科学基金资助项目(22ATQ004);华中师范大学交叉科学研究项目(CCNU22JC033)

Video super-resolution reconstruction network based on frame straddling optical flow

Yang LIU, Rong LIU(), Ke FANG, Xinyue ZHANG, Guangxu WANG

College of Physical Science and Technology，Central China Normal University，Wuhan Hubei 430079，China

Received:2023-05-05 Revised:2023-08-18 Accepted:2023-08-24 Online:2023-12-04 Published:2024-04-10
Contact: Rong LIU
About author:LIU Yang， born in 1999， M. S. candidate. His research interests include computer vision.
LIU Rong， born in 1969， Ph. D.， associate professor. Her research interests include intelligent information processing， artificial intelligence.
FANG Ke， born in 1999， M. S. candidate. His research interests include computer vision.
ZHANG Xinyue， born in 1998， M. S. candidate. Her research interests include natural language processing.
WANG Guangxu， born in 1999， M. S. candidate. His research interests include natural language processing.
Supported by:
National Social Science Foundation of China(22ATQ004);Cross Disciplinary Scientific Research Projects of Central China Normal University(CCNU22JC033)

摘要/Abstract

摘要：

面对运动幅度较大的复杂场景，当前的视频超分辨率（VSR）算法在处理长序列时无法充分利用不同距离的帧间信息，难以精确地恢复遮挡、边界和多细节区域。为解决上述问题，提出一种基于帧间跨越光流机制的VSR模型。首先，通过密集残差块（RDB）提取低分辨率视频帧（LR）的浅层特征；其次，通过光流空间金字塔网络（SPyNet）以不同时间长度的跨越光流对视频帧进行运动估计和运动补偿，并通过RDB对帧间信息进行深层特征提取与矫正；最后，融合浅层特征与深层特征，并通过上采样得到高分辨率视频帧（HR）。在REDS4公开数据集上的实验结果表明，所提模型与经典的非显式运动补偿的动态上采样滤波器视频超分辨率网络（DUF-VSR）相比，峰值信噪比（PSNR）和结构相似性（SSIM）分别提升了1.07 dB和0.06。验证了所提模型可有效提高视频图像重建的质量。

关键词: 视频超分辨率算法, 光流, 运动补偿, 密集残差块, 深层特征

Abstract:

Current Video Super-Resolution （VSR） algorithms cannot fully utilize inter-frame information of different distances when processing complex scenes with large motion amplitude， resulting in difficulty in accurately recovering occlusion， boundaries， and multi-detail regions. A VSR model based on frame straddling optical flow was proposed to solve these problems. Firstly， shallow features of Low-Resolution frames （LR） were extracted through Residual Dense Blocks （RDBs）. Then， motion estimation and compensation was performed on video frames using a Spatial Pyramid Network （SPyNet） with straddling optical flows of different time lengths， and deep feature extraction and correction was performed on inter-frame information through RDBs connected in multiple layers. Finally， the shallow and deep features were fused， and High-Resolution frames （HR） were obtained through up-sampling. The experimental results on the REDS4 public dataset show that compared with deep Video Super-Resolution network using Dynamic Upsampling Filters without explicit motion compensation （DUF-VSR）， the proposed model improves Peak Signal-to-Noise Ratio （PSNR） and Structure Similarity Index Measure （SSIM） by 1.07 dB and 0.06， respectively. The experimental results show that the proposed model can effectively improve the quality of video image reconstruction.

Key words: Video Super-Resolution (VSR) algorithm, optical flow, motion compensation, Residual Dense Block (RDB), deep feature

中图分类号:

TP391.4

刘扬, 刘蓉, 方可, 张心月, 王光旭. 基于帧间跨越光流的视频超分辨率重建网络[J]. 计算机应用, 2024, 44(4): 1277-1284.

Yang LIU, Rong LIU, Ke FANG, Xinyue ZHANG, Guangxu WANG. Video super-resolution reconstruction network based on frame straddling optical flow[J]. Journal of Computer Applications, 2024, 44(4): 1277-1284.

图/表 17

图1 本文模型的网络结构

Fig. 1 Network structure of proposed model

图2 预处理模块结构

Fig. 2 Pre-processing module structure

图3 帧间跨越光流处理模块结构

Fig. 3 Frame straddling optical flow processing module structure

图4 Ffb模块结构

Fig. 4 Ffb module structure

图5 特征矫正模块结构

Fig. 5 Feature correction module structure

图6 Fff与Fsb模块的结构

Fig. 6 Structures of Fff and Fsb module

图7 Fsf模块结构

Fig. 7 Fsf module structure

图8 上采样重建模块结构

Fig. 8 Up-sampling reconstruction module structure

图9 不同模型在REDS4-015第83帧图像上的重建效果对比

Fig. 9 Reconstructed result comparison of different models on frame 83 of REDS4-015

图10 不同模型在Vid4-walk第31帧图像上的重建效果对比

Fig. 10 Reconstructed result comparison of different models on frame 31 of Vid4-walk

表1 不同模型在4个数据集上的结果对比

Tab. 1 Results comparison of different models on four datasets

模型	运行时间/ms	参数量/10⁶	浮点运算量/ GFLOPs	REDS4		Vid4		SPMC		UDM10
模型	运行时间/ms	参数量/10⁶	浮点运算量/ GFLOPs	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM
Bicubic	—	—	—	26.14	0.72	21.80	0.52	23.29	0.64	28.47	0.82
EDVR-m	156	3.3	200.0	28.11	0.80	23.54	0.69	27.27	0.75	37.42	0.93
DUF-VSR	559	5.8	736.6	29.40	0.81	25.01	0.73	27.96	0.79	37.97	0.93
BasicVSR	1 306	6.3	163.7	30.21	0.85	24.94	0.74	28.52	0.82	38.20	0.94
RawVSR	1 610	4.5	356.9	30.20	0.86	25.04	0.75	28.54	0.84	38.44	0.96
OVSR	1 152	1.8	110.0	30.17	0.85	25.15	0.79	28.57	0.83	38.18	0.94
本文模型	1 544	6.3	336.4	30.47	0.87	25.17	0.79	28.59	0.83	38.31	0.95

图11 遮挡情况下的重建效果对比

Fig. 11 Reconstructed result comparison under occlusion

图12 边界区域下的重建效果对比

Fig. 12 Reconstructed result comparison on boundary

图13 多细节区域的重建效果对比

Fig. 13 Reconstructed result comparison on multiple details

表2 不同模型在REDS4数据集上的性能对比

Tab. 2 Performance comparison of different models on REDS4 dataset

模型	运行时间/ms	参数量/10⁶	浮点运算量/ GFLOPs	REDS4-000		REDS4-011		REDS4-015		REDS4-020
模型	运行时间/ms	参数量/10⁶	浮点运算量/ GFLOPs	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM
ModelX	1 129	4.6	192.2	27.72	0.80	30.13	0.85	32.16	0.88	29.18	0.85
BasicVSR	1 306	6.3	163.7	27.67	0.81	30.83	0.87	32.75	0.90	29.61	0.87
本文模型	1 544	6.3	336.4	27.83	0.82	31.13	0.88	33.12	0.91	29.82	0.88

表3 不同的预处理残差块数和特征矫正残差块数对REDS4数据集上性能的影响

Tab. 3 Different numbers of PRDB and RDB on performance on REDS4 dataset

P	R	PSNR/dB	SSIM	运行时间/ms	参数量/10⁶	浮点运算量/ GFLOPs
4	6	30.38	0.85	1624	7.1	439.4
6	6	30.26	0.83	1731	7.7	484.0
3	6	30.41	0.87	1558	6.8	405.8
3	4	30.17	0.83	1436	5.8	267.0
3	5	30.47	0.87	1544	6.3	336.4

表4 各模块在REDS4数据集上的有效性实验结果

Tab. 4 Validation experimental results of different modules on REDS4 dataset

模型	5帧前向分支	3帧后向分支	预处理	长距离复用	PSNR/dB	SSIM	运行时间/ms	参数量/10⁶	浮点运算量/GFLOPs
Model1					29.92	0.81	1 066	3.7	125.2
Model2	√				30.00	0.81	1 282	4.5	183.1
Model3	√	√			30.25	0.83	1 481	5.4	270.7
Model4	√	√	√		30.33	0.84	1 543	6.3	336.4
本文模型	√	√	√	√	30.47	0.87	1 544	6.3	336.4

参考文献 32

1	SON S， LEE S， NAH S， et al. NTIRE 2021 challenge on video super-resolution ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2021： 166-181.
2	ZHANG L， ZHANG H， SHEN H，et al. A super-resolution reconstruction algorithm for surveillance images［J］. Signal Processing，2010，90（3）： 848-859. 10.1016/j.sigpro.2009.09.002
3	SCHULTZ R R， STEVENSON R L. Extraction of high-resolution frames from video sequences［J］. IEEE Transactions on Image Processing， 1996， 5（6）： 996-1011. 10.1109/83.503915
4	CHAN K C K， WANG X， YU K， et al. BasicVSR： the search for essential components in video super-resolution and beyond ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 4945-4954. 10.1109/cvpr46437.2021.00491
5	TIAN Y， ZHANG Y， FU Y， et al. TDAN： temporally-deformable alignment network for video super-resolution ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 3360-3369. 10.1109/cvpr42600.2020.00342
6	DAI J， QI H， XIONG Y， et al. Deformable convolutional networks ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 764-773. 10.1109/iccv.2017.89
7	WANG X， CHAN K C K， YU K， et al. EDVR： video restoration with enhanced deformable convolutional networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2019： 1954-1963. 10.1109/cvprw.2019.00247
8	LI W， TAO X， GUO T，et al. MuCAN： multi-correspondence aggregation network for video super-resolution［C］// Proceedings of the 16th European Conference on Computer Vision. Cham： Springer， 2020： 335-351. 10.1007/978-3-030-58607-2_20
9	JO Y， OH S W， KANG J， et al. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 3224-3232. 10.1109/cvpr.2018.00340
10	CAO J， LI Y， ZHANG K， et al. Video super-resolution transformer ［EB/OL］. ［2023-05-01］. .
11	ISOBE T， JIA X， GU S， et al. Video super-resolution with recurrent structure-detail network ［C］// Proceedings of the 16th European Conference on Computer Vision. Cham： Springer， 2020： 645-660. 10.1007/978-3-030-58610-2_38
12	ISOBE T， ZHU F， JIA X， et al. Revisiting temporal modeling for video super-resolution ［EB/OL］. ［2023-05-01］. .
13	JIANG L， WANG N， DANG Q， et al. PP-MSVSR： multi-stage video super-resolution ［EB/OL］. ［2023-05-01］. .
14	ISOBE T， JIA X， TAO X， et al. Look back and forth： video super-resolution with explicit temporal difference modeling ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 17411-17420. 10.1109/cvpr52688.2022.01689
15	XIAO Y， YUAN Q， JIANG K， et al. Local-global temporal difference learning for satellite video super-resolution ［EB/OL］. ［2023-05-01］. . 10.1109/tcsvt.2023.3312321
16	CHAN K C K， ZHOU S， XU X， et al. BasicVSR++： improving video super-resolution with enhanced propagation and alignment ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 5972-5981. 10.1109/cvpr52688.2022.00588
17	CHAN K C K， ZHOU S， XU X， et al. Investigating tradeoffs in real-world video super-resolution ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2021： 5962-5971. 10.1109/cvpr52688.2022.00587
18	LIANG J， CAO J， SUN G， et al. SwinIR： image restoration using swin transformer ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 1833-1844. 10.1109/iccvw54120.2021.00210
19	RANJAN A， BLACK M J. Optical flow estimation using a spatial pyramid network ［C］ // Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 4161-4170. 10.1109/cvpr.2017.291
20	SONG Y， ZHU Y， DU X. Dynamic residual dense network for image denoising ［J］. Sensors， 2019， 19（17）： 3809. 10.3390/s19173809
21	WANG X， XIE L， DONG C， et al. Real-ESRGAN： training real-world blind super-resolution with pure synthetic data ［C］ // Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 1905-1914. 10.1109/iccvw54120.2021.00217
22	YAMANAKA J， KUWASHIMA S， KURITA T. Fast and accurate image super resolution by deep CNN with skip connection and network in network ［C］// Proceedings of the 24th International Conference Neural Information Processing. Cham： Springer， 2017： 217-225. 10.1007/978-3-319-70096-0_23
23	CHEN Y， WU T. SATVSR： scenario adaptive transformer for cross scenarios video super-resolution ［J］. Journal of Physics： Conference Series， 2023， 2456： 012028. 10.1088/1742-6596/2456/1/012028
24	CABALLERO J， LEDIG C， AITKEN A， et al. Real-time video super-resolution with spatio-temporal networks and motion compensation ［C］ // Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 4778-4787. 10.1109/cvpr.2017.304
25	LAI W-S， HUANG J-B， AHUJA N， et al. Fast and accurate image super-resolution with deep laplacian pyramid networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2019， 41（11）： 2599-2613. 10.1109/tpami.2018.2865304
26	NAH S， BAIK S， HONG S， et al. NTIRE 2019 challenge on video deblurring and super-resolution： dataset and study ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2019： 1996-2005. 10.1109/cvprw.2019.00251
27	HARIS M， SHAKHNAROVICH G， UKITA N. Recurrent back-projection network for video super-resolution ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3897-3906. 10.1109/cvpr.2019.00402
28	KINGMA D P， BA J. Adam： a method for stochastic optimization ［EB/OL］. ［2023-05-01］. .
29	LOSHCHILOV I， HUTTER F. SGDR： stochastic gradient descent with warm restarts ［EB/OL］. ［2023-05-01］. .
30	佟雨兵，张其善，祁云平.基于PSNR与SSIM联合的图像质量评价模型［J］.中国图象图形学报，2006， 11（12）： 1758-1763. 10.11834/jig.2006012307
	TONG Y B， ZHANG Q S， QI Y P. Image quality assessing by combining PSNR with SSIM ［J］. Journal of Image and Graphics， 2006， 11（12）： 1758-1763. 10.11834/jig.2006012307
31	LIU X， SHI K， WANG Z， et al. Exploit camera raw data for video super-resolution via hidden Markov model inference ［J］. IEEE Transactions on Image Processing， 2021， 30： 2127-2140. 10.1109/tip.2021.3049974
32	YI P， WANG Z， JIANG K， et al. Omniscient video super-resolution ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 4429-4438. 10.1109/iccv48922.2021.00439

[1]	张晔, 刘蓉, 刘明, 陈明. 基于多通道注意力机制的图像超分辨率重建网络[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1563-1569.
[2]	刘士豪, 胡学敏, 姜博厚, 张若晗, 孔力. 基于生成对抗双网络的虚拟到真实驾驶场景的视频翻译模型[J]. 计算机应用, 2020, 40(6): 1621-1626.
[3]	张家岗, 李达平, 杨晓东, 邹茂扬, 吴锡, 胡金蓉. 基于深度卷积特征光流的形变医学图像配准算法[J]. 计算机应用, 2020, 40(6): 1799-1805.
[4]	聂可卉, 刘文哲, 童同, 杜民, 高钦泉. 基于自适应可分离卷积核的视频压缩伪影去除算法[J]. 计算机应用, 2019, 39(5): 1473-1479.
[5]	廖斌, 吴文. 区域配对引导的光照传播视频阴影去除方法[J]. 计算机应用, 2019, 39(2): 556-563.
[6]	魏震宇, 文畅, 谢凯, 贺建飚. 光流估计下的移动端实时人脸检测[J]. 计算机应用, 2018, 38(4): 1146-1150.
[7]	胡学敏, 易重辉, 陈钦, 陈茜, 陈龙. 基于运动显著图的人群异常行为检测[J]. 计算机应用, 2018, 38(4): 1164-1169.
[8]	丁飞飞, 杨文元. 信息熵约束下的视频目标分割[J]. 计算机应用, 2018, 38(10): 2782-2787.
[9]	朱明敏, 胡茂海. 基于相关滤波器的长时视觉目标跟踪方法[J]. 计算机应用, 2017, 37(5): 1466-1470.
[10]	魏玮, 马瑞, 王小芳. 视频中人脸位置的定量检测[J]. 计算机应用, 2017, 37(3): 801-805.
[11]	厉丹, 鲍蓉, 孙金萍, 肖理庆, 党向盈. 多分辨率LK光流联合SURF的跟踪方法[J]. 计算机应用, 2017, 37(3): 806-810.
[12]	李雪君, 张开华, 宋慧慧. 融合时空多特征表示的无监督视频分割算法[J]. 计算机应用, 2017, 37(11): 3134-3138.
[13]	邹洪中, 许悦雷, 马时平, 李帅, 张文达. 基于视皮层V1模型的随机点视频序列运动特征提取[J]. 计算机应用, 2016, 36(6): 1677-1681.
[14]	高智勇, 唐文峰, 贺良杰. 基于运动显著性的移动镜头下的运动目标检测[J]. 计算机应用, 2016, 36(6): 1692-1698.
[15]	潘磊, 周欢, 王明辉. 适用于密集人群的异常事件实时检测方法[J]. 计算机应用, 2016, 36(6): 1719-1723.

基于帧间跨越光流的视频超分辨率重建网络

Video super-resolution reconstruction network based on frame straddling optical flow

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 32

相关文章 15

编辑推荐

Metrics