基于注意力融合网络的视频超分辨率重建

doi:10.11772/j.issn.1001-9081.2020081292

计算机应用 ›› 2021, Vol. 41 ›› Issue (4): 1012-1019.DOI: 10.11772/j.issn.1001-9081.2020081292

所属专题： CCF第35届中国计算机应用大会（CCF NCCA 2020）

• CCF第35届中国计算机应用大会（CCF NCCA 2020） • 上一篇下一篇

基于注意力融合网络的视频超分辨率重建

卞鹏程, 郑忠龙, 李明禄, 何依然, 王天翔, 张大伟, 陈丽媛

浙江师范大学数学与计算机科学学院, 浙江金华 321004

收稿日期:2020-08-24 修回日期:2020-09-18 出版日期:2021-04-10 发布日期:2020-11-05
通讯作者: 郑忠龙
作者简介:卞鹏程（1993—），男，安徽六安人，硕士研究生，主要研究方向：深度学习、计算机视觉；郑忠龙（1976—），男，河北沧州人，教授，博士，CCF会员，主要研究方向：模式识别、机器学习、图像处理；李明禄（1965—），男，重庆人，教授，博士，CCF会员，主要研究方向：云计算、车辆自组网络、无线传感器网络、大数据分析；何依然（1996—），女，浙江杭州人，硕士，主要研究方向：机器学习；王天翔（1994—），男，浙江金华人，博士研究生，主要研究方向：机器学习、计算机视觉；张大伟（1995—），男，江苏宿迁人，博士研究生，主要研究方向：深度学习、计算机视觉；陈丽媛（1994—），女，河南焦作人，博士研究生，主要研究方向：深度学习、计算机视觉。
基金资助:
国家自然科学基金资助项目（61672467）；浙江省自然科学基金资助项目（LGG18F020017）。

Attention fusion network based video super-resolution reconstruction

BIAN Pengcheng, ZHENG Zhonglong, LI Minglu, HE Yiran, WANG Tianxiang, ZHANG Dawei, CHEN Liyuan

College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua Zhejiang 321004, China

Received:2020-08-24 Revised:2020-09-18 Online:2021-04-10 Published:2020-11-05
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61672467), the Zhejiang Provincial Natural Science Foundation (LGG18F020017).

摘要/Abstract

摘要： 基于深度学习的视频超分辨率方法主要关注视频帧内和帧间的时空关系，但以往的方法在视频帧的特征对齐和融合方面存在运动信息估计不精确、特征融合不充分等问题。针对这些问题，采用反向投影原理并结合多种注意力机制和融合策略构建了一个基于注意力融合网络（AFN）的视频超分辨率模型。首先，在特征提取阶段，为了处理相邻帧和参考帧之间的多种运动，采用反向投影结构来获取运动信息的误差反馈；然后，使用时间、空间和通道注意力融合模块来进行多维度的特征挖掘和融合；最后，在重建阶段，将得到的高维特征经过卷积重建出高分辨率的视频帧。通过学习视频帧内和帧间特征的不同权重，充分挖掘了视频帧之间的相关关系，并利用迭代网络结构采取渐进的方式由粗到精地处理提取到的特征。在两个公开的基准数据集上的实验结果表明，AFN能够有效处理包含多种运动和遮挡的视频，与一些主流方法相比在量化指标上提升较大，如对于4倍重建任务，AFN产生的视频帧的峰值信噪比（PSNR）在Vid4数据集上比帧循环视频超分辨率网络（FRVSR）产生的视频帧的PSNR提高了13.2%，在SPMCS数据集上比动态上采样滤波视频超分辨率网络（VSR-DUF）产生的视频帧的PSNR提高了15.3%。

关键词: 超分辨率, 注意力机制, 特征融合, 反向投影, 视频重建

Abstract: Video super-resolution methods based on deep learning mainly focus on the inter-frame and intra-frame spatio-temporal relationships in the video, but previous methods have many shortcomings in the feature alignment and fusion of video frames, such as inaccurate motion information estimation and insufficient feature fusion. Aiming at these problems, a video super-resolution model based on Attention Fusion Network(AFN) was constructed with the use of the back-projection principle and the combination of multiple attention mechanisms and fusion strategies. Firstly, at the feature extraction stage, in order to deal with multiple motions between neighbor frames and reference frame, the back-projection architecture was used to obtain the error feedback of motion information. Then, a temporal, spatial and channel attention fusion module was used to perform the multi-dimensional feature mining and fusion. Finally, at the reconstruction stage, the obtained high-dimensional features were convoluted to reconstruct high-resolution video frames. By learning different weights of features within and between video frames, the correlations between video frames were fully explored, and an iterative network structure was adopted to process the extracted features gradually from coarse to fine. Experimental results on two public benchmark datasets show that AFN can effectively process videos with multiple motions and occlusions, and achieves significant improvements in quantitative indicators compared to some mainstream methods. For instance, for 4-times reconstruction task, the Peak Signal-to-Noise Ratio(PSNR) of the frame reconstructed by AFN is 13.2% higher than that of Frame Recurrent Video Super-Resolution network(FRVSR) on Vid4 dataset and 15.3% higher than that of Video Super-Resolution network using Dynamic Upsampling Filter(VSR-DUF) on SPMCS dataset.

Key words: super-resolution, attention mechanism, feature fusion, back-projection, video reconstruction

中图分类号:

TP391.4

卞鹏程, 郑忠龙, 李明禄, 何依然, 王天翔, 张大伟, 陈丽媛. 基于注意力融合网络的视频超分辨率重建[J]. 计算机应用, 2021, 41(4): 1012-1019.

BIAN Pengcheng, ZHENG Zhonglong, LI Minglu, HE Yiran, WANG Tianxiang, ZHANG Dawei, CHEN Liyuan. Attention fusion network based video super-resolution reconstruction[J]. Journal of Computer Applications, 2021, 41(4): 1012-1019.

参考文献

[1] DONG C,LOY C C,HE K,et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,28(2):295-307.
[2] KIM J,LEE J K. LEE K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:1646-1654.
[3] LIM B,SON S,KIM H,et al. Enhanced deep residual networks for single image super-resolution[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:1132-1140.
[4] ZHANG Y,TIAN Y,KONG Y,et al. Residual dense network for image super-resolution[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Piscataway:IEEE,2018:2472-2481.
[5] DAI T,CAI J,ZHANG Y,et al. Second-order attention network for single image super-resolution[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:11057-11066.
[6] GUO Y,CHEN J,WANG J,et al. Closed-loop matters:dual regression networks for single image super-resolution[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2020:5406-5415.
[7] GARCIA D C,DOREA C,DE QUEIROZ R L. Super resolution for multiview images using depth information[J]. IEEE Transactions on Circuits and Systems for Video Technology,2012,22(9):1249-1256.
[8] FARAMARZI E,RAJAN D,CHRISTENSEN M P. Unified blind method for multi-image super-resolution and single/multi-image blur deconvolution[J]. IEEE Transactions on Image Processing,2013, 22(6):2101-2114.
[9] LIU C,SUN D. A Bayesian approach to adaptive video super resolution[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2011:209-216.
[10] GUO J, CHAO H. Building an end-to-end spatial-temporal convolutional network for video super-resolution[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press,2017:4053-4060.
[11] SAJJADI M S M, VEMULAPALLI R, BROWN M. Framerecurrent video super-resolution[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6626-6634.
[12] WANG X,CHAN K C K,YU K,et al. EDVR:video restoration with enhanced deformable convolutional networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway:IEEE, 2019:1954-1963.
[13] 何小海, 吴媛媛, 陈为龙, 等. 视频超分辨率重建技术综述[J]. 信息与电子工程,2011,9(1):1-6.(HE X H,WU Y Y,CHEN W L,et al. A survey of video super-resolution reconstruction technology[J]. Information and Electronic Engineering,2011,9(1):1-6.)
[14] ANWAR S,KHAN S,BARNES N. A deep journey into superresolution:a survey[J]. ACM Computing Surveys,2020,53(3):No. 60.
[15] DONG C,LOY C C,HE K,et al. Learning a deep convolutional network for image super-resolution[C]//Proceedings of the 2014 European Conference on Computer Vision,LNCS 8692. Cham:Springer,2014:184-199.
[16] KAPPELER A,YOO S,DAI Q,et al. Video super-resolution with convolutional neural networks[J]. IEEE Transactions on Computational Imaging,2016,2(2):109-122.
[17] CABALLERO J,LEDIG C,AITKEN A,et al. Real-time video super-resolution with spatio-temporal networks and motion compensation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:2848-2857.
[18] TAO X,GAO H,LIAO R,et al. Detail-revealing deep video super-resolution[C]//Proceedings of the 2017 IEEE Conference on Computer Vision. Piscataway:IEEE,2017:4482-4490.
[19] SHI X,CHEN Z,WANG H,et al. Convolutional LSTM network:a machine learning approach for precipitation nowcasting[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2015:802-810.
[20] LIU D,WANG Z,FAN Y,et al. Robust video super-resolution with learned temporal dynamics[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2526-2534.
[21] JO Y,OH S W,KANG J,et al. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:3224-3232.
[22] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:4489-4497.
[23] HARIS M,SHAKHNAROVICH G,UKITA N. Recurrent backprojection network for video super-resolution[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:3892-3901.
[24] YI P,WANG Z,JIANG K,et al. Progressive fusion video superresolution network via exploiting non-local spatio-temporal correlations[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:3106-3115.
[25] WANG X,GIRSHICK R,GUPTA A,et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:7794-7803.
[26] JADERBERG M,SIMONYAN K,ZISSERMAN A,et al. Spatial transformer networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2015:2017-2025.
[27] HU J,SHEN L,SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:7132-7141.
[28] WANG Q,WU B,ZHU P,et al. ECA-net:efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2020:11531-11539.
[29] WANG F,JIANG M,QIAN C,et al. Residual attention network for image classification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6450-6458.
[30] LIU Y,WANG Y,LI N,et al. An attention-based approach for single image super resolution[C]//Proceedings of the 24th International Conference on Pattern Recognition. Piscataway:IEEE,2018:2777-2784.
[31] ZHANG Y,LI K,LI K,et al. Image super-resolution using very deep residual channel attention networks[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11211. Cham:Springer,2018:294-310.
[32] LIU Z S,WANG L W,LI C T,et al. Image super-resolution via attention based back projection networks[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway:IEEE,2019:3517-3525.
[33] LIU J,ZHANG W,TANG Y,et al. Residual feature aggregation network for image super-resolution[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2020:2356-2365.
[34] IRANI M,PELEG S. Improving resolution by image registration[J]. CVGIP:Graphical Models and Image Processing,1991,53(3):231-239.
[35] HARIS M, SHAKHNAROVICH G, UKITA N. Deep backprojection networks for super-resolution[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:1664-1673.
[36] WOO S,PARK J,LEE J Y,et al. CBAM:convolutional block attention module[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11211. Cham:Springer, 2018:3-19.
[37] XUE T,CHEN B,WU J,et al. Video enhancement with taskoriented flow[J]. International Journal of Computer Vision,2019, 127(8):1106-1125.
[38] HE K,ZHANG X,REN S,et al. Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//Proceedings of the 2015 IEEE Conference on Computer Vision. Piscataway:IEEE,2015:1026-1034.
[39] KINGMA D P, BA J L. Adam:a method for stochastic optimization[EB/OL].[2018-12-22]. https://arxiv.org/pdf/1412.6980.pdf.
[40] WANG Z,BOVIK A C,SHEIKH H R,et al. Image quality assessment:from error visibility to structural similarity[J]. IEEE Transactions on Image Processing,2004,13(4):600-612.
[41] HORÉ A,ZIOU D. Image quality metrics:PSNR vs. SSIM[C]//Proceedings of the 20th International Conference on Pattern Recognition. Piscataway:IEEE,2010:2366-2369.

基于注意力融合网络的视频超分辨率重建

Attention fusion network based video super-resolution reconstruction

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[2]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[3]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[4]	刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522.
[5]	党伟超, 李涛, 白尚旺, 高改梅, 刘春霞. 基于自注意力长短期记忆网络的Web软件系统实时剩余寿命预测方法[J]. 计算机应用, 2021, 41(8): 2346-2351.
[6]	王伟, 赵尔平, 崔志远, 孙浩. 基于HowNet义原和Word2vec词向量表示的多特征融合消歧方法[J]. 计算机应用, 2021, 41(8): 2193-2198.
[7]	周险兵, 樊小超, 任鸽, 杨勇. 基于多层次语义特征的英文作文自动评分方法[J]. 计算机应用, 2021, 41(8): 2205-2211.
[8]	李扬志, 袁家政, 刘宏哲. 基于时空注意力图卷积网络模型的人体骨架动作识别算法[J]. 计算机应用, 2021, 41(7): 1915-1921.
[9]	张洋, 江铭虎. 基于注意力机制的文本作者识别[J]. 计算机应用, 2021, 41(7): 1897-1901.
[10]	李朝, 兰海, 魏宪. 基于注意力的毫米波-激光雷达融合目标检测[J]. 计算机应用, 2021, 41(7): 2137-2144.
[11]	吴丽丹, 薛雨阳, 童同, 杜民, 高钦泉. 基于前景语义信息的图像着色算法[J]. 计算机应用, 2021, 41(7): 2048-2053.
[12]	杜炎, 吕良福, 焦一辰. 基于模糊推理的模糊原型网络[J]. 计算机应用, 2021, 41(7): 1885-1890.
[13]	武维, 李泽平, 杨华蔚, 林川, 王忠德. 融合内容特征和时序信息的深度注意力视频流行度预测模型[J]. 计算机应用, 2021, 41(7): 1878-1884.
[14]	高钦泉, 黄炳城, 刘文哲, 童同. 基于改进CenterNet的竹条表面缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1933-1938.
[15]	牛康力, 谌雨章, 沈君凤, 曾张帆, 潘永才, 王绎冲. 基于深度学习的双通道夜视图像复原方法[J]. 计算机应用, 2021, 41(6): 1775-1784.