Attention fusion network based video super-resolution reconstruction

doi:10.11772/j.issn.1001-9081.2020081292

Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (4): 1012-1019.DOI: 10.11772/j.issn.1001-9081.2020081292

Special Issue: CCF第35届中国计算机应用大会（CCF NCCA 2020）

• The 35 CCF National Conference of Computer Applications (CCF NCCA 2020) • Previous Articles Next Articles

Attention fusion network based video super-resolution reconstruction

BIAN Pengcheng, ZHENG Zhonglong, LI Minglu, HE Yiran, WANG Tianxiang, ZHANG Dawei, CHEN Liyuan

College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua Zhejiang 321004, China

Received:2020-08-24 Revised:2020-09-18 Online:2020-11-05 Published:2021-04-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61672467), the Zhejiang Provincial Natural Science Foundation (LGG18F020017).

基于注意力融合网络的视频超分辨率重建

卞鹏程, 郑忠龙, 李明禄, 何依然, 王天翔, 张大伟, 陈丽媛

浙江师范大学数学与计算机科学学院, 浙江金华 321004

通讯作者: 郑忠龙
作者简介:卞鹏程（1993—），男，安徽六安人，硕士研究生，主要研究方向：深度学习、计算机视觉；郑忠龙（1976—），男，河北沧州人，教授，博士，CCF会员，主要研究方向：模式识别、机器学习、图像处理；李明禄（1965—），男，重庆人，教授，博士，CCF会员，主要研究方向：云计算、车辆自组网络、无线传感器网络、大数据分析；何依然（1996—），女，浙江杭州人，硕士，主要研究方向：机器学习；王天翔（1994—），男，浙江金华人，博士研究生，主要研究方向：机器学习、计算机视觉；张大伟（1995—），男，江苏宿迁人，博士研究生，主要研究方向：深度学习、计算机视觉；陈丽媛（1994—），女，河南焦作人，博士研究生，主要研究方向：深度学习、计算机视觉。
基金资助:
国家自然科学基金资助项目（61672467）；浙江省自然科学基金资助项目（LGG18F020017）。

Abstract

Abstract: Video super-resolution methods based on deep learning mainly focus on the inter-frame and intra-frame spatio-temporal relationships in the video, but previous methods have many shortcomings in the feature alignment and fusion of video frames, such as inaccurate motion information estimation and insufficient feature fusion. Aiming at these problems, a video super-resolution model based on Attention Fusion Network(AFN) was constructed with the use of the back-projection principle and the combination of multiple attention mechanisms and fusion strategies. Firstly, at the feature extraction stage, in order to deal with multiple motions between neighbor frames and reference frame, the back-projection architecture was used to obtain the error feedback of motion information. Then, a temporal, spatial and channel attention fusion module was used to perform the multi-dimensional feature mining and fusion. Finally, at the reconstruction stage, the obtained high-dimensional features were convoluted to reconstruct high-resolution video frames. By learning different weights of features within and between video frames, the correlations between video frames were fully explored, and an iterative network structure was adopted to process the extracted features gradually from coarse to fine. Experimental results on two public benchmark datasets show that AFN can effectively process videos with multiple motions and occlusions, and achieves significant improvements in quantitative indicators compared to some mainstream methods. For instance, for 4-times reconstruction task, the Peak Signal-to-Noise Ratio(PSNR) of the frame reconstructed by AFN is 13.2% higher than that of Frame Recurrent Video Super-Resolution network(FRVSR) on Vid4 dataset and 15.3% higher than that of Video Super-Resolution network using Dynamic Upsampling Filter(VSR-DUF) on SPMCS dataset.

Key words: super-resolution, attention mechanism, feature fusion, back-projection, video reconstruction

摘要： 基于深度学习的视频超分辨率方法主要关注视频帧内和帧间的时空关系，但以往的方法在视频帧的特征对齐和融合方面存在运动信息估计不精确、特征融合不充分等问题。针对这些问题，采用反向投影原理并结合多种注意力机制和融合策略构建了一个基于注意力融合网络（AFN）的视频超分辨率模型。首先，在特征提取阶段，为了处理相邻帧和参考帧之间的多种运动，采用反向投影结构来获取运动信息的误差反馈；然后，使用时间、空间和通道注意力融合模块来进行多维度的特征挖掘和融合；最后，在重建阶段，将得到的高维特征经过卷积重建出高分辨率的视频帧。通过学习视频帧内和帧间特征的不同权重，充分挖掘了视频帧之间的相关关系，并利用迭代网络结构采取渐进的方式由粗到精地处理提取到的特征。在两个公开的基准数据集上的实验结果表明，AFN能够有效处理包含多种运动和遮挡的视频，与一些主流方法相比在量化指标上提升较大，如对于4倍重建任务，AFN产生的视频帧的峰值信噪比（PSNR）在Vid4数据集上比帧循环视频超分辨率网络（FRVSR）产生的视频帧的PSNR提高了13.2%，在SPMCS数据集上比动态上采样滤波视频超分辨率网络（VSR-DUF）产生的视频帧的PSNR提高了15.3%。

关键词: 超分辨率, 注意力机制, 特征融合, 反向投影, 视频重建

CLC Number:

TP391.4

BIAN Pengcheng, ZHENG Zhonglong, LI Minglu, HE Yiran, WANG Tianxiang, ZHANG Dawei, CHEN Liyuan. Attention fusion network based video super-resolution reconstruction[J]. Journal of Computer Applications, 2021, 41(4): 1012-1019.

卞鹏程, 郑忠龙, 李明禄, 何依然, 王天翔, 张大伟, 陈丽媛. 基于注意力融合网络的视频超分辨率重建[J]. 计算机应用, 2021, 41(4): 1012-1019.

References

[1] DONG C,LOY C C,HE K,et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,28(2):295-307.
[2] KIM J,LEE J K. LEE K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:1646-1654.
[3] LIM B,SON S,KIM H,et al. Enhanced deep residual networks for single image super-resolution[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:1132-1140.
[4] ZHANG Y,TIAN Y,KONG Y,et al. Residual dense network for image super-resolution[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Piscataway:IEEE,2018:2472-2481.
[5] DAI T,CAI J,ZHANG Y,et al. Second-order attention network for single image super-resolution[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:11057-11066.
[6] GUO Y,CHEN J,WANG J,et al. Closed-loop matters:dual regression networks for single image super-resolution[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2020:5406-5415.
[7] GARCIA D C,DOREA C,DE QUEIROZ R L. Super resolution for multiview images using depth information[J]. IEEE Transactions on Circuits and Systems for Video Technology,2012,22(9):1249-1256.
[8] FARAMARZI E,RAJAN D,CHRISTENSEN M P. Unified blind method for multi-image super-resolution and single/multi-image blur deconvolution[J]. IEEE Transactions on Image Processing,2013, 22(6):2101-2114.
[9] LIU C,SUN D. A Bayesian approach to adaptive video super resolution[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2011:209-216.
[10] GUO J, CHAO H. Building an end-to-end spatial-temporal convolutional network for video super-resolution[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press,2017:4053-4060.
[11] SAJJADI M S M, VEMULAPALLI R, BROWN M. Framerecurrent video super-resolution[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6626-6634.
[12] WANG X,CHAN K C K,YU K,et al. EDVR:video restoration with enhanced deformable convolutional networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway:IEEE, 2019:1954-1963.
[13] 何小海, 吴媛媛, 陈为龙, 等. 视频超分辨率重建技术综述[J]. 信息与电子工程,2011,9(1):1-6.(HE X H,WU Y Y,CHEN W L,et al. A survey of video super-resolution reconstruction technology[J]. Information and Electronic Engineering,2011,9(1):1-6.)
[14] ANWAR S,KHAN S,BARNES N. A deep journey into superresolution:a survey[J]. ACM Computing Surveys,2020,53(3):No. 60.
[15] DONG C,LOY C C,HE K,et al. Learning a deep convolutional network for image super-resolution[C]//Proceedings of the 2014 European Conference on Computer Vision,LNCS 8692. Cham:Springer,2014:184-199.
[16] KAPPELER A,YOO S,DAI Q,et al. Video super-resolution with convolutional neural networks[J]. IEEE Transactions on Computational Imaging,2016,2(2):109-122.
[17] CABALLERO J,LEDIG C,AITKEN A,et al. Real-time video super-resolution with spatio-temporal networks and motion compensation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:2848-2857.
[18] TAO X,GAO H,LIAO R,et al. Detail-revealing deep video super-resolution[C]//Proceedings of the 2017 IEEE Conference on Computer Vision. Piscataway:IEEE,2017:4482-4490.
[19] SHI X,CHEN Z,WANG H,et al. Convolutional LSTM network:a machine learning approach for precipitation nowcasting[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2015:802-810.
[20] LIU D,WANG Z,FAN Y,et al. Robust video super-resolution with learned temporal dynamics[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2526-2534.
[21] JO Y,OH S W,KANG J,et al. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:3224-3232.
[22] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:4489-4497.
[23] HARIS M,SHAKHNAROVICH G,UKITA N. Recurrent backprojection network for video super-resolution[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:3892-3901.
[24] YI P,WANG Z,JIANG K,et al. Progressive fusion video superresolution network via exploiting non-local spatio-temporal correlations[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:3106-3115.
[25] WANG X,GIRSHICK R,GUPTA A,et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:7794-7803.
[26] JADERBERG M,SIMONYAN K,ZISSERMAN A,et al. Spatial transformer networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2015:2017-2025.
[27] HU J,SHEN L,SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:7132-7141.
[28] WANG Q,WU B,ZHU P,et al. ECA-net:efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2020:11531-11539.
[29] WANG F,JIANG M,QIAN C,et al. Residual attention network for image classification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6450-6458.
[30] LIU Y,WANG Y,LI N,et al. An attention-based approach for single image super resolution[C]//Proceedings of the 24th International Conference on Pattern Recognition. Piscataway:IEEE,2018:2777-2784.
[31] ZHANG Y,LI K,LI K,et al. Image super-resolution using very deep residual channel attention networks[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11211. Cham:Springer,2018:294-310.
[32] LIU Z S,WANG L W,LI C T,et al. Image super-resolution via attention based back projection networks[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway:IEEE,2019:3517-3525.
[33] LIU J,ZHANG W,TANG Y,et al. Residual feature aggregation network for image super-resolution[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2020:2356-2365.
[34] IRANI M,PELEG S. Improving resolution by image registration[J]. CVGIP:Graphical Models and Image Processing,1991,53(3):231-239.
[35] HARIS M, SHAKHNAROVICH G, UKITA N. Deep backprojection networks for super-resolution[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:1664-1673.
[36] WOO S,PARK J,LEE J Y,et al. CBAM:convolutional block attention module[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11211. Cham:Springer, 2018:3-19.
[37] XUE T,CHEN B,WU J,et al. Video enhancement with taskoriented flow[J]. International Journal of Computer Vision,2019, 127(8):1106-1125.
[38] HE K,ZHANG X,REN S,et al. Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//Proceedings of the 2015 IEEE Conference on Computer Vision. Piscataway:IEEE,2015:1026-1034.
[39] KINGMA D P, BA J L. Adam:a method for stochastic optimization[EB/OL].[2018-12-22]. https://arxiv.org/pdf/1412.6980.pdf.
[40] WANG Z,BOVIK A C,SHEIKH H R,et al. Image quality assessment:from error visibility to structural similarity[J]. IEEE Transactions on Image Processing,2004,13(4):600-612.
[41] HORÉ A,ZIOU D. Image quality metrics:PSNR vs. SSIM[C]//Proceedings of the 20th International Conference on Pattern Recognition. Piscataway:IEEE,2010:2366-2369.

Attention fusion network based video super-resolution reconstruction

基于注意力融合网络的视频超分辨率重建

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[2]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[3]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[4]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[8]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[9]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[10]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[11]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[12]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[13]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[14]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[15]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.