计算机应用 ›› 2020, Vol. 40 ›› Issue (3): 819-824.DOI: 10.11772/j.issn.1001-9081.2019081474

• 虚拟现实与多媒体计算 • 上一篇    下一篇

基于深度体素流的模糊视频插帧方法

林传健1,2, 邓炜3, 童同3, 高钦泉1,2,3   

  1. 1. 福州大学 物理与信息工程学院, 福州 350116;
    2. 福建省医疗器械与医药技术重点实验室(福州大学), 福州 350116;
    3. 福建帝视信息科技有限公司, 福州 350116
  • 收稿日期:2019-08-29 修回日期:2019-10-23 出版日期:2020-03-10 发布日期:2019-11-06
  • 通讯作者: 高钦泉
  • 作者简介:林传健(1994-),男,福建宁德人,硕士研究生,主要研究方向:计算机视觉、图像处理;邓炜(1992-),男,福建龙岩人,硕士,主要研究方向:计算机视觉、图像处理;童同(1986-),男,安徽安庆人,研究员,博士,主要研究方向:人工智能与计算机视觉、医学图像处理与分析;高钦泉(1986-),男,福建福清人,副研究员,博士,主要研究方向:人工智能与计算机视觉、医学图像处理与分析、计算机辅助手术导航。
  • 基金资助:
    国家自然科学基金资助项目(61802065)。

Blurred video frame interpolation method based on deep voxel flow

LIN Chuanjian1,2, DENG Wei3, TONG Tong3, GAO Qinquan1,2,3   

  1. 1. College of Physics and Information Engineering, Fuzhou University, Fuzhou Fujian 350116, China;
    2. Key Lab of Medical Instrumentation and Pharmaceutical Technology of Fujian Province(Fuzhou University), Fuzhou Fujian 350116, China;
    3. Imperial Vision Technology Company Limited, Fuzhou Fujian 350116, China
  • Received:2019-08-29 Revised:2019-10-23 Online:2020-03-10 Published:2019-11-06
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61802065).

摘要: 针对视频运动模糊严重影响插帧效果的情况,提出了一种新型的模糊视频插帧方法。首先,提出一种多任务融合卷积神经网络,该网络结构由两个模块组成:去模糊模块和插帧模块。其中,去模糊模块采用残差块堆叠的深度卷积神经网络(CNN),提取并学习深度模糊特征以实现两帧输入图像的运动模糊去除;插帧模块用于估计帧间的体素流,所得体素流将用于指导像素进行三线性插值以合成中间帧。其次,制作了大型模糊视频仿真数据集,并提出一种先分后合、由粗略至细致的训练策略,实验结果表明该策略促进了多任务网络有效收敛。最后,对比前沿的去模糊和插帧算法组合,实验指标显示所提方法合成中间帧时峰值信噪比最少提高1.41 dB,结构相似性提升0.020,插值误差降低1.99。视觉对比及重制序列展示表明,所提模型对于模糊视频有着显著的帧率上转换效果,即能够将两帧模糊视频帧端对端重制为清晰且视觉连贯的三帧视频帧。

关键词: 深度学习, 卷积神经网络, 去模糊, 体素流, 视频插帧

Abstract: Motion blur has an extremely negative effect on video frame interpolation. In order to handle this problem, a novel blurred video frame interpolation method was proposed. Firstly, a multi-task fusion convolutional neural network was proposed, which consists of a deblurring module and a frame interpolation module. In the deblurring module, based on the deep Convolutional Neural Network (CNN) with stack of ResBlocks, motion blur removal of two input frames was implemented by extracting and learning the deep blur features. And the frame interpolation module was used to estimate voxel flow between two consecutive frames after blur removal, then the obtained voxel flow was used to guide the trilinear interpolation of the pixels to synthesize the intermediate frame. Secondly, a large blurred video simulation dataset was made, and a “first separate and then combine” “from coarse to fine” training strategy was proposed, experimental results show that this strategy promotes the effective convergence of the multi-task fusion network. Finally, compared with the simple combination of the state-of-the-art deblurring and frame interpolation algorithms, experimental metrics show that the intermediate frame synthesized by the proposed method has the peak-to-noise ratio increased by 1.41 dB, the structural similarity improved by 0.020, and the interpolation error decreased by 1.99, at least. Visual comparison and reconstructed sequences show that the proposed model performs good frame rate up conversion effect for blurred videos, in other words, two blurred consecutive frames can be reconstructed end-to-end to three sharp and visually smooth frames by the model.

Key words: deep learning, Convolutional Neural Network (CNN), debluring, voxel flow, video frame interpolation

中图分类号: